-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move the TreeDesc fields and finish adding cache line markers to State #279
Conversation
This PR is stacked on top of #278. It groups the smaller fields into cache lines to make it easier to do future changes such as reducing usize to u16 and removing fields that are faster to calculate. |
Rebasing fixed some of the problems, but I can’t reproduce the CI test failure on aarch64 |
I can't reproduce them either, maybe github CI is having issues? I did just retry them without any success, but maybe it takes a little while. (the only reason I can think of is some sort of file/disk corruption) |
The github actions problem should be resolved: https://www.githubstatus.com/incidents/dk61qxd21mtl However I just retried CI an it is still broken. |
well so in https://github.com/trifectatechfoundation/zlib-rs/pull/280/files I just made whitespace changes and CI still fails with the same error as here. I just don't see how that could be us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting some good instruction count wins locally now, and actual wall_time in some cases
Benchmark 2 (62 runs): target/release/examples/blogpost-compress 1 rs silesia-small.tar
measurement mean ± σ min … max outliers delta
wall_time 81.7ms ± 935us 80.4ms … 87.4ms 5 ( 8%) - 1.3% ± 0.4%
peak_rss 26.7MB ± 50.5KB 26.6MB … 26.7MB 11 (18%) + 0.4% ± 0.1%
cpu_cycles 299M ± 2.72M 296M … 315M 2 ( 3%) ⚡- 1.6% ± 0.3%
instructions 601M ± 274 601M … 601M 0 ( 0%) ⚡- 9.1% ± 0.0%
cache_references 19.9M ± 166K 19.7M … 20.6M 2 ( 3%) + 0.5% ± 0.3%
cache_misses 393K ± 84.7K 293K … 714K 1 ( 2%) ⚡- 10.7% ± 7.1%
branch_misses 2.98M ± 4.51K 2.97M … 2.99M 0 ( 0%) 💩+ 1.5% ± 0.1%
Benchmark 2 (37 runs): target/release/examples/blogpost-compress 2 rs silesia-small.tar
measurement mean ± σ min … max outliers delta
wall_time 135ms ± 3.56ms 133ms … 153ms 2 ( 5%) - 1.3% ± 0.9%
peak_rss 25.0MB ± 59.0KB 24.9MB … 25.0MB 0 ( 0%) - 0.0% ± 0.1%
cpu_cycles 546M ± 15.4M 538M … 629M 2 ( 5%) - 1.4% ± 1.0%
instructions 1.12G ± 280 1.12G … 1.12G 0 ( 0%) ⚡- 4.9% ± 0.0%
cache_references 34.3M ± 394K 33.7M … 35.5M 1 ( 3%) + 0.0% ± 0.5%
cache_misses 1.05M ± 233K 789K … 1.65M 2 ( 5%) + 3.4% ± 10.0%
branch_misses 7.01M ± 3.71K 7.00M … 7.02M 1 ( 3%) + 0.7% ± 0.0%
Benchmark 2 (34 runs): target/release/examples/blogpost-compress 3 rs silesia-small.tar
measurement mean ± σ min … max outliers delta
wall_time 151ms ± 1.55ms 148ms … 155ms 0 ( 0%) ⚡- 2.9% ± 0.5%
peak_rss 24.7MB ± 66.4KB 24.6MB … 24.8MB 0 ( 0%) - 0.1% ± 0.1%
cpu_cycles 615M ± 6.70M 607M … 637M 1 ( 3%) ⚡- 3.4% ± 0.5%
instructions 1.46G ± 250 1.46G … 1.46G 0 ( 0%) ⚡- 3.7% ± 0.0%
cache_references 43.9M ± 630K 43.1M … 45.3M 5 (15%) - 0.7% ± 0.6%
cache_misses 1.16M ± 338K 850K … 2.21M 7 (21%) + 7.8% ± 14.1%
branch_misses 7.86M ± 4.05K 7.85M … 7.87M 0 ( 0%) 💩+ 1.0% ± 0.0%
Benchmark 2 (29 runs): target/release/examples/blogpost-compress 4 rs silesia-small.tar
measurement mean ± σ min … max outliers delta
wall_time 173ms ± 2.75ms 172ms … 186ms 2 ( 7%) ⚡- 3.4% ± 1.6%
peak_rss 24.6MB ± 61.7KB 24.5MB … 24.6MB 0 ( 0%) - 0.1% ± 0.1%
cpu_cycles 722M ± 8.77M 715M … 762M 3 (10%) ⚡- 3.4% ± 1.4%
instructions 1.53G ± 293 1.53G … 1.53G 0 ( 0%) ⚡- 3.5% ± 0.0%
cache_references 62.7M ± 701K 61.9M … 65.1M 1 ( 3%) - 0.4% ± 0.6%
cache_misses 1.82M ± 356K 1.42M … 2.79M 2 ( 7%) 💩+ 28.4% ± 12.5%
branch_misses 8.54M ± 17.8K 8.52M … 8.60M 5 (17%) + 0.9% ± 0.1%
Benchmark 1 (26 runs): target/release/examples/compress-baseline 5 rs silesia-small.tar
measurement mean ± σ min … max outliers delta
wall_time 189ms ± 1.46ms 187ms … 193ms 1 ( 4%) ⚡- 2.2% ± 0.7%
peak_rss 24.6MB ± 47.4KB 24.5MB … 24.6MB 4 (15%) + 0.1% ± 0.1%
cpu_cycles 792M ± 5.36M 785M … 808M 0 ( 0%) ⚡- 2.5% ± 0.8%
instructions 1.74G ± 273 1.74G … 1.74G 0 ( 0%) ⚡- 2.9% ± 0.0%
cache_references 68.6M ± 680K 67.8M … 69.9M 0 ( 0%) + 0.1% ± 0.6%
cache_misses 1.87M ± 344K 1.42M … 2.56M 0 ( 0%) 💩+ 17.6% ± 13.8%
branch_misses 9.15M ± 7.72K 9.14M … 9.17M 0 ( 0%) + 0.9% ± 0.0%
Benchmark 2 (22 runs): target/release/examples/blogpost-compress 6 rs silesia-small.tar
measurement mean ± σ min … max outliers delta
wall_time 228ms ± 3.56ms 224ms … 238ms 3 (14%) - 0.6% ± 0.7%
peak_rss 24.6MB ± 51.7KB 24.5MB … 24.6MB 4 (18%) + 0.0% ± 0.1%
cpu_cycles 975M ± 14.5M 962M … 1.02G 1 ( 5%) - 0.9% ± 0.7%
instructions 1.88G ± 313 1.88G … 1.88G 0 ( 0%) ⚡- 2.6% ± 0.0%
cache_references 107M ± 923K 105M … 109M 0 ( 0%) + 0.9% ± 0.4%
cache_misses 2.53M ± 484K 1.98M … 3.65M 0 ( 0%) 💩+ 18.3% ± 11.9%
branch_misses 9.37M ± 21.1K 9.35M … 9.42M 3 (14%) + 1.0% ± 0.1%
Benchmark 2 (12 runs): target/release/examples/blogpost-compress 9 rs silesia-small.tar
measurement mean ± σ min … max outliers delta
wall_time 431ms ± 5.06ms 428ms … 447ms 1 ( 8%) - 0.4% ± 0.7%
peak_rss 24.4MB ± 87.6KB 24.2MB … 24.5MB 0 ( 0%) - 0.1% ± 0.3%
cpu_cycles 1.91G ± 20.5M 1.90G … 1.98G 1 ( 8%) - 0.3% ± 0.7%
instructions 3.32G ± 713 3.32G … 3.32G 1 ( 8%) ⚡- 1.8% ± 0.0%
cache_references 197M ± 1.23M 195M … 199M 1 ( 8%) + 0.5% ± 0.4%
cache_misses 3.56M ± 728K 2.51M … 4.65M 0 ( 0%) 💩+ 77.8% ± 25.5%
branch_misses 19.0M ± 37.5K 18.9M … 19.1M 1 ( 8%) - 0.5% ± 0.2%
No performance regression measured at compression levels 1-9 on my test system