admission: insufficient tokens under high write load and WAL failover #138655

sumeerbhola · 2025-01-08T17:01:58Z

Observations from https://cockroachlabs.slack.com/archives/C06UFBJ743F/p1736348529278319?thread_ts=1736262662.853989&cid=C06UFBJ743F:

A single flush adds a huge amount of data (~1.6GiB and 800 files) to L0. AC's ioLoadListener notices this at the same time as Pebble compactions (out of L0) start ramping up. Within 30s the backlog is cleared. During this time period, AC tokens are insufficient for regular work. The memtable backlog did not cause a write stall since WAL failover was active (the primary disk was not actually stalled, but slower because of high bandwidth usage -- this test was not using AC's bandwidth limiting).

The ioLoadListener can use a low value of compaction bandwidth for deciding on tokens for various reasons:

[AC deficiency] Compactions haven't yet ramped up enough. The ioLoadListener samples at 15s intervals. The assumption has been that if it sees L0 overload, Pebble has been seeing it for a few seconds. This doesn't apply when all the L0 bytes were added via a single flush. In the above experiment, there probably wasn't a ramp up issue since 800 files results in an uncompensated score of 1.6 and 2 sublevels (prior to the flush) result in a score of 2.0. It was running with compaction concurrency of 3, and competing with L3 and L4 which also had high scores.
[AC/Pebble inconsistency] Pebble uses L0CompactionThreshold=2, L0CompactionFileThreshold=500. 1 sublevel results in a score of 1. 500 files result in a score of 1. So 1 sublevel==500 files.
AC uses 20 sublevels == 1000 files in its scoring. This is wildly inconsistent. It the latter had a higher file count threshold, AC would give out more tokens.
[AC deficiency] During WAL failover ioLoadListener stops changing its estimates of the compacted bytes. This means that if WAL failover is only due to slight slowness and the compactions out of L0 are successfully ramping up, ioLoadListener will ignore this ramp up and give out fewer tokens. It should update its estimates if the observed compaction bytes are increasing.

Jira issue: CRDB-46185

The text was updated successfully, but these errors were encountered:

The current value of 1000 was too inconsistent with Pebble's compaction scoring, in that compaction scoring had 1 sub-level == 500 L0 files, and admission control had 1 sub-level == 50 L0 files. The value is increased to 4000, so 1 sub-level == 200 L0 files. There is a long code comment elaborating on this. Informs cockroachdb#138655 Epic: none Release note (ops change): The default value of cluster setting admission.l0_file_count_overload_threshold is changed to 4000.

WAL failover can happen even when the throughput sustainable by the primary is high (since we tune WAL failover to happen at a 100ms latency threshold). So if the workload increase is resulting in more compactions, we should incorporate that into the tokens given out by the ioLoadListener. Informs cockroachdb#138655 Epic: none Release note: None

138699: admission: increase file count based overload threshold r=aadityasondhi a=sumeerbhola The current value of 1000 was too inconsistent with Pebble's compaction scoring, in that compaction scoring had 1 sub-level == 500 L0 files, and admission control had 1 sub-level == 50 L0 files. The value is increased to 4000, so 1 sub-level == 200 L0 files. There is a long code comment elaborating on this. Informs #138655 Epic: none Release note (ops change): The default value of cluster setting admission.l0_file_count_overload_threshold is changed to 4000. Co-authored-by: sumeerbhola <[email protected]>

WAL failover can happen even when the throughput sustainable by the primary is high (since we tune WAL failover to happen at a 100ms latency threshold). So if the workload increase is resulting in more compactions, we should incorporate that into the tokens given out by the ioLoadListener. Informs cockroachdb#138655 Epic: none Release note: None

138708: admission: incorporate higher compaction rate during WAL failover r=aadityasondhi a=sumeerbhola WAL failover can happen even when the throughput sustainable by the primary is high (since we tune WAL failover to happen at a 100ms latency threshold). So if the workload increase is resulting in more compactions, we should incorporate that into the tokens given out by the ioLoadListener. Informs #138655 Epic: none Release note: None Co-authored-by: sumeerbhola <[email protected]>

WAL failover can happen even when the throughput sustainable by the primary is high (since we tune WAL failover to happen at a 100ms latency threshold). So if the workload increase is resulting in more compactions, we should incorporate that into the tokens given out by the ioLoadListener. Informs cockroachdb#138655 Epic: none Release note: None

138640: kserver: enable leader leases for TestFlowControlCrashedNodeV2 r=miraradeva a=miraradeva In 2c89915, TestFlowControlCrashedNode and TestFlowControlCrashedNodeV2 were set up to run with leader leases. TestFlowControlCrashedNodeV2 soon became flaky (#136292) because after one of the two nodes was stopped, the other node (leader) could step down due to `CheckQuorum`. TestFlowControlCrashedNode flaked similarly under race. This commit adds a third node to the tests, so the leader can retain leadership in the presence of a single crash. It also enables leader leases back for TestFlowControlCrashedNodeV2. Part of: #136806 Release note: None 138708: admission: incorporate higher compaction rate during WAL failover r=aadityasondhi a=sumeerbhola WAL failover can happen even when the throughput sustainable by the primary is high (since we tune WAL failover to happen at a 100ms latency threshold). So if the workload increase is resulting in more compactions, we should incorporate that into the tokens given out by the ioLoadListener. Informs #138655 Epic: none Release note: None Co-authored-by: Mira Radeva <[email protected]> Co-authored-by: sumeerbhola <[email protected]>

sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-admission-control T-admission-control Admission Control labels Jan 8, 2025

sumeerbhola self-assigned this Jan 8, 2025

andrewbaptist added the O-perturbation Bugs found by the perturbation framework label Jan 8, 2025

sumeerbhola mentioned this issue Jan 8, 2025

admission: increase file count based overload threshold #138699

Merged

sumeerbhola mentioned this issue Jan 9, 2025

admission: incorporate higher compaction rate during WAL failover #138708

Merged

sumeerbhola mentioned this issue Jan 10, 2025

admission: poor modeling during WAL failover due to huge flushes #138798

Closed

sumeerbhola closed this as completed Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admission: insufficient tokens under high write load and WAL failover #138655

admission: insufficient tokens under high write load and WAL failover #138655

sumeerbhola commented Jan 8, 2025 •

edited by cockroach-jira-scripts

Loading

admission: insufficient tokens under high write load and WAL failover #138655

admission: insufficient tokens under high write load and WAL failover #138655

Comments

sumeerbhola commented Jan 8, 2025 • edited by cockroach-jira-scripts Loading

sumeerbhola commented Jan 8, 2025 •

edited by cockroach-jira-scripts

Loading