-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_prev: a simple scheduler tested on OLTP workloads #1275
Conversation
A FIFO-only variation on scx_simple with CPU selection that prioritizes an idle previous CPU over a fully idle core (as is done in scx_simple and scx_rusty). scx_prev outperforms a few other schedulers on OLTP workloads run on systems with relatively flat topology (i.e. non-NUMA, single LLC) by changing CPU selection as above and by taking advantage of the more aggressive work conservation (i.e. idle balancing) that comes with sched_ext by default. It's far from being a full-fledged scheduler, but it demonstrates how a small change to an existing scheduler can improve performance in a real application. Notes: - AMD EPYC 7J13 (16-CPU VM) server running v6.12-based UEK-next kernel, scx (688bffc "Merge pull request sched-ext#1192 from devnexen/code_simpl3"), and MySQL Community Edition 8.4[0] - AMD EPYC 7551 (128-CPU BM) client running BMK[1] (a sysbench-based BenchMark Kit) - Each data point in the table below represents the average of ten, one-minute runs done after a three-minute warmup. The server is rebooted between each scheduler. - "cli" means the number of database clients. - Each %diff column is relative to eevdf. Representative BMK testcase: sb11-OLTP_RO_10M_8tab-uniform-ps-notrx.sh cli eevdf (std%) rusty (std%) %diff simple (std%) %diff prev (std%) %diff --- ------------ ------------ ----- ------------- ----- ----------- ----- throughput 16 4140 ( 1%) 4224 ( 1%) ( 2%) 4276 ( 2%) ( 3%) 4263 ( 1%) ( 3%) 32 7382 ( 1%) 7259 ( 1%) ( -2%) 7314 ( 1%) ( -1%) 7919 ( 1%) ( 7%) 48 9015 ( 0%) 9644 ( 0%) ( 7%) 10055 ( 0%) ( 12%) 10411 ( 1%) ( 15%) 64 9765 ( 1%) 9601 ( 0%) ( -2%) 10214 ( 0%) ( 5%) 10481 ( 0%) ( 7%) average latency 16 4 ( 1%) 4 ( 1%) ( -2%) 4 ( 2%) ( -3%) 4 ( 1%) ( -3%) 32 4 ( 1%) 4 ( 1%) ( 2%) 4 ( 1%) ( 1%) 4 ( 1%) ( -7%) 48 5 ( 0%) 5 ( 0%) ( -7%) 5 ( 0%) (-10%) 5 ( 1%) (-13%) 64 7 ( 1%) 7 ( 0%) ( 2%) 6 ( 0%) ( -4%) 6 ( 0%) ( -7%) 95p latency 16 4 ( 3%) 4 ( 2%) ( -4%) 4 ( 4%) ( -1%) 4 ( 4%) ( -7%) 32 5 ( 2%) 5 ( 1%) ( 1%) 5 ( 2%) ( 1%) 4 ( 2%) (-11%) 48 7 ( 1%) 6 ( 1%) (-16%) 5 ( 1%) (-24%) 5 ( 1%) (-26%) 64 9 ( 3%) 8 ( 0%) (-12%) 7 ( 0%) (-26%) 7 ( 1%) (-26%) In the read-only workload, prev consistently outperforms with equal or better throughput and latency across the board. [0] https://github.com/mysql/mysql-server/tree/8.4 [1] http://dimitrik.free.fr/blog/posts/mysql-perf-bmk-kit.html Signed-off-by: Daniel Jordan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! If you'd like to elaborate on the results and your use case, we have a Slack and meet every Tuesday at 11AM EST.
{ | ||
s32 cpu; | ||
|
||
if (p->nr_cpus_allowed == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this condition is always false, ops.select_cpu()
is always skipped if the task can only run on 1 cpu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, thanks, I see how ->select_cpu() is always skipped in the in-kernel scheduler core for nr_cpus_allowed == 1. I'll send a follow up deleting the unused branch.
As Andrea points out[0], select_cpu() is never called for such tasks, so this branch is dead code. Remove it. [0] sched-ext#1275 Signed-off-by: Daniel Jordan <[email protected]>
A FIFO-only variation on scx_simple with CPU selection that prioritizes an idle previous CPU over a fully idle core (as is done in scx_simple and scx_rusty).
scx_prev outperforms a few other schedulers on OLTP workloads run on systems with relatively flat topology (i.e. non-NUMA, single LLC) by changing CPU selection as above and by taking advantage of the more aggressive work conservation (i.e. idle balancing) that comes with sched_ext by default.
It's far from being a full-fledged scheduler, but it demonstrates how a small change to an existing scheduler can improve performance in a real application.
Notes:
In the read-only workload, prev consistently outperforms with equal or better throughput and latency across the board.
[0] https://github.com/mysql/mysql-server/tree/8.4
[1] http://dimitrik.free.fr/blog/posts/mysql-perf-bmk-kit.html