-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admission: insufficient tokens under high write load and WAL failover #138655
Labels
A-admission-control
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-perturbation
Bugs found by the perturbation framework
T-admission-control
Admission Control
Comments
sumeerbhola
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-admission-control
T-admission-control
Admission Control
labels
Jan 8, 2025
sumeerbhola
added a commit
to sumeerbhola/cockroach
that referenced
this issue
Jan 8, 2025
The current value of 1000 was too inconsistent with Pebble's compaction scoring, in that compaction scoring had 1 sub-level == 500 L0 files, and admission control had 1 sub-level == 50 L0 files. The value is increased to 4000, so 1 sub-level == 200 L0 files. There is a long code comment elaborating on this. Informs cockroachdb#138655 Epic: none Release note (ops change): The default value of cluster setting admission.l0_file_count_overload_threshold is changed to 4000.
sumeerbhola
added a commit
to sumeerbhola/cockroach
that referenced
this issue
Jan 9, 2025
WAL failover can happen even when the throughput sustainable by the primary is high (since we tune WAL failover to happen at a 100ms latency threshold). So if the workload increase is resulting in more compactions, we should incorporate that into the tokens given out by the ioLoadListener. Informs cockroachdb#138655 Epic: none Release note: None
craig bot
pushed a commit
that referenced
this issue
Jan 9, 2025
138699: admission: increase file count based overload threshold r=aadityasondhi a=sumeerbhola The current value of 1000 was too inconsistent with Pebble's compaction scoring, in that compaction scoring had 1 sub-level == 500 L0 files, and admission control had 1 sub-level == 50 L0 files. The value is increased to 4000, so 1 sub-level == 200 L0 files. There is a long code comment elaborating on this. Informs #138655 Epic: none Release note (ops change): The default value of cluster setting admission.l0_file_count_overload_threshold is changed to 4000. Co-authored-by: sumeerbhola <[email protected]>
sumeerbhola
added a commit
to sumeerbhola/cockroach
that referenced
this issue
Jan 10, 2025
WAL failover can happen even when the throughput sustainable by the primary is high (since we tune WAL failover to happen at a 100ms latency threshold). So if the workload increase is resulting in more compactions, we should incorporate that into the tokens given out by the ioLoadListener. Informs cockroachdb#138655 Epic: none Release note: None
craig bot
pushed a commit
that referenced
this issue
Jan 10, 2025
138708: admission: incorporate higher compaction rate during WAL failover r=aadityasondhi a=sumeerbhola WAL failover can happen even when the throughput sustainable by the primary is high (since we tune WAL failover to happen at a 100ms latency threshold). So if the workload increase is resulting in more compactions, we should incorporate that into the tokens given out by the ioLoadListener. Informs #138655 Epic: none Release note: None Co-authored-by: sumeerbhola <[email protected]>
sumeerbhola
added a commit
to sumeerbhola/cockroach
that referenced
this issue
Jan 10, 2025
WAL failover can happen even when the throughput sustainable by the primary is high (since we tune WAL failover to happen at a 100ms latency threshold). So if the workload increase is resulting in more compactions, we should incorporate that into the tokens given out by the ioLoadListener. Informs cockroachdb#138655 Epic: none Release note: None
craig bot
pushed a commit
that referenced
this issue
Jan 10, 2025
138640: kserver: enable leader leases for TestFlowControlCrashedNodeV2 r=miraradeva a=miraradeva In 2c89915, TestFlowControlCrashedNode and TestFlowControlCrashedNodeV2 were set up to run with leader leases. TestFlowControlCrashedNodeV2 soon became flaky (#136292) because after one of the two nodes was stopped, the other node (leader) could step down due to `CheckQuorum`. TestFlowControlCrashedNode flaked similarly under race. This commit adds a third node to the tests, so the leader can retain leadership in the presence of a single crash. It also enables leader leases back for TestFlowControlCrashedNodeV2. Part of: #136806 Release note: None 138708: admission: incorporate higher compaction rate during WAL failover r=aadityasondhi a=sumeerbhola WAL failover can happen even when the throughput sustainable by the primary is high (since we tune WAL failover to happen at a 100ms latency threshold). So if the workload increase is resulting in more compactions, we should incorporate that into the tokens given out by the ioLoadListener. Informs #138655 Epic: none Release note: None Co-authored-by: Mira Radeva <[email protected]> Co-authored-by: sumeerbhola <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-admission-control
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-perturbation
Bugs found by the perturbation framework
T-admission-control
Admission Control
Observations from https://cockroachlabs.slack.com/archives/C06UFBJ743F/p1736348529278319?thread_ts=1736262662.853989&cid=C06UFBJ743F:
A single flush adds a huge amount of data (~1.6GiB and 800 files) to L0. AC's
ioLoadListener
notices this at the same time as Pebble compactions (out of L0) start ramping up. Within 30s the backlog is cleared. During this time period, AC tokens are insufficient for regular work. The memtable backlog did not cause a write stall since WAL failover was active (the primary disk was not actually stalled, but slower because of high bandwidth usage -- this test was not using AC's bandwidth limiting).The
ioLoadListener
can use a low value of compaction bandwidth for deciding on tokens for various reasons:AC uses 20 sublevels == 1000 files in its scoring. This is wildly inconsistent. It the latter had a higher file count threshold, AC would give out more tokens.
ioLoadListener
stops changing its estimates of the compacted bytes. This means that if WAL failover is only due to slight slowness and the compactions out of L0 are successfully ramping up,ioLoadListener
will ignore this ramp up and give out fewer tokens. It should update its estimates if the observed compaction bytes are increasing.Jira issue: CRDB-46185
The text was updated successfully, but these errors were encountered: