Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the TreeDesc fields and finish adding cache line markers to State #279

Merged
merged 2 commits into from
Jan 9, 2025

Conversation

brian-pane
Copy link

No performance regression measured at compression levels 1-9 on my test system

@brian-pane
Copy link
Author

This PR is stacked on top of #278. It groups the smaller fields into cache lines to make it easier to do future changes such as reducing usize to u16 and removing fields that are faster to calculate.

zlib-rs/src/deflate.rs Outdated Show resolved Hide resolved
@brian-pane
Copy link
Author

Rebasing fixed some of the problems, but I can’t reproduce the CI test failure on aarch64

@folkertdev
Copy link
Collaborator

I can't reproduce them either, maybe github CI is having issues? I did just retry them without any success, but maybe it takes a little while. (the only reason I can think of is some sort of file/disk corruption)

@bjorn3
Copy link
Collaborator

bjorn3 commented Jan 8, 2025

The github actions problem should be resolved: https://www.githubstatus.com/incidents/dk61qxd21mtl However I just retried CI an it is still broken.

@folkertdev
Copy link
Collaborator

well so in https://github.com/trifectatechfoundation/zlib-rs/pull/280/files I just made whitespace changes and CI still fails with the same error as here. I just don't see how that could be us.

Copy link
Collaborator

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting some good instruction count wins locally now, and actual wall_time in some cases

Benchmark 2 (62 runs): target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          81.7ms ±  935us    80.4ms … 87.4ms          5 ( 8%)          -  1.3% ±  0.4%
  peak_rss           26.7MB ± 50.5KB    26.6MB … 26.7MB         11 (18%)          +  0.4% ±  0.1%
  cpu_cycles          299M  ± 2.72M      296M  …  315M           2 ( 3%)        ⚡-  1.6% ±  0.3%
  instructions        601M  ±  274       601M  …  601M           0 ( 0%)        ⚡-  9.1% ±  0.0%
  cache_references   19.9M  ±  166K     19.7M  … 20.6M           2 ( 3%)          +  0.5% ±  0.3%
  cache_misses        393K  ± 84.7K      293K  …  714K           1 ( 2%)        ⚡- 10.7% ±  7.1%
  branch_misses      2.98M  ± 4.51K     2.97M  … 2.99M           0 ( 0%)        💩+  1.5% ±  0.1%
Benchmark 2 (37 runs): target/release/examples/blogpost-compress 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           135ms ± 3.56ms     133ms …  153ms          2 ( 5%)          -  1.3% ±  0.9%
  peak_rss           25.0MB ± 59.0KB    24.9MB … 25.0MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          546M  ± 15.4M      538M  …  629M           2 ( 5%)          -  1.4% ±  1.0%
  instructions       1.12G  ±  280      1.12G  … 1.12G           0 ( 0%)        ⚡-  4.9% ±  0.0%
  cache_references   34.3M  ±  394K     33.7M  … 35.5M           1 ( 3%)          +  0.0% ±  0.5%
  cache_misses       1.05M  ±  233K      789K  … 1.65M           2 ( 5%)          +  3.4% ± 10.0%
  branch_misses      7.01M  ± 3.71K     7.00M  … 7.02M           1 ( 3%)          +  0.7% ±  0.0%
Benchmark 2 (34 runs): target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           151ms ± 1.55ms     148ms …  155ms          0 ( 0%)        ⚡-  2.9% ±  0.5%
  peak_rss           24.7MB ± 66.4KB    24.6MB … 24.8MB          0 ( 0%)          -  0.1% ±  0.1%
  cpu_cycles          615M  ± 6.70M      607M  …  637M           1 ( 3%)        ⚡-  3.4% ±  0.5%
  instructions       1.46G  ±  250      1.46G  … 1.46G           0 ( 0%)        ⚡-  3.7% ±  0.0%
  cache_references   43.9M  ±  630K     43.1M  … 45.3M           5 (15%)          -  0.7% ±  0.6%
  cache_misses       1.16M  ±  338K      850K  … 2.21M           7 (21%)          +  7.8% ± 14.1%
  branch_misses      7.86M  ± 4.05K     7.85M  … 7.87M           0 ( 0%)        💩+  1.0% ±  0.0%
Benchmark 2 (29 runs): target/release/examples/blogpost-compress 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           173ms ± 2.75ms     172ms …  186ms          2 ( 7%)        ⚡-  3.4% ±  1.6%
  peak_rss           24.6MB ± 61.7KB    24.5MB … 24.6MB          0 ( 0%)          -  0.1% ±  0.1%
  cpu_cycles          722M  ± 8.77M      715M  …  762M           3 (10%)        ⚡-  3.4% ±  1.4%
  instructions       1.53G  ±  293      1.53G  … 1.53G           0 ( 0%)        ⚡-  3.5% ±  0.0%
  cache_references   62.7M  ±  701K     61.9M  … 65.1M           1 ( 3%)          -  0.4% ±  0.6%
  cache_misses       1.82M  ±  356K     1.42M  … 2.79M           2 ( 7%)        💩+ 28.4% ± 12.5%
  branch_misses      8.54M  ± 17.8K     8.52M  … 8.60M           5 (17%)          +  0.9% ±  0.1%
Benchmark 1 (26 runs): target/release/examples/compress-baseline 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           189ms ± 1.46ms     187ms …  193ms          1 ( 4%)        ⚡-  2.2% ±  0.7%
  peak_rss           24.6MB ± 47.4KB    24.5MB … 24.6MB          4 (15%)          +  0.1% ±  0.1%
  cpu_cycles          792M  ± 5.36M      785M  …  808M           0 ( 0%)        ⚡-  2.5% ±  0.8%
  instructions       1.74G  ±  273      1.74G  … 1.74G           0 ( 0%)        ⚡-  2.9% ±  0.0%
  cache_references   68.6M  ±  680K     67.8M  … 69.9M           0 ( 0%)          +  0.1% ±  0.6%
  cache_misses       1.87M  ±  344K     1.42M  … 2.56M           0 ( 0%)        💩+ 17.6% ± 13.8%
  branch_misses      9.15M  ± 7.72K     9.14M  … 9.17M           0 ( 0%)          +  0.9% ±  0.0%
Benchmark 2 (22 runs): target/release/examples/blogpost-compress 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           228ms ± 3.56ms     224ms …  238ms          3 (14%)          -  0.6% ±  0.7%
  peak_rss           24.6MB ± 51.7KB    24.5MB … 24.6MB          4 (18%)          +  0.0% ±  0.1%
  cpu_cycles          975M  ± 14.5M      962M  … 1.02G           1 ( 5%)          -  0.9% ±  0.7%
  instructions       1.88G  ±  313      1.88G  … 1.88G           0 ( 0%)        ⚡-  2.6% ±  0.0%
  cache_references    107M  ±  923K      105M  …  109M           0 ( 0%)          +  0.9% ±  0.4%
  cache_misses       2.53M  ±  484K     1.98M  … 3.65M           0 ( 0%)        💩+ 18.3% ± 11.9%
  branch_misses      9.37M  ± 21.1K     9.35M  … 9.42M           3 (14%)          +  1.0% ±  0.1%
Benchmark 2 (12 runs): target/release/examples/blogpost-compress 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           431ms ± 5.06ms     428ms …  447ms          1 ( 8%)          -  0.4% ±  0.7%
  peak_rss           24.4MB ± 87.6KB    24.2MB … 24.5MB          0 ( 0%)          -  0.1% ±  0.3%
  cpu_cycles         1.91G  ± 20.5M     1.90G  … 1.98G           1 ( 8%)          -  0.3% ±  0.7%
  instructions       3.32G  ±  713      3.32G  … 3.32G           1 ( 8%)        ⚡-  1.8% ±  0.0%
  cache_references    197M  ± 1.23M      195M  …  199M           1 ( 8%)          +  0.5% ±  0.4%
  cache_misses       3.56M  ±  728K     2.51M  … 4.65M           0 ( 0%)        💩+ 77.8% ± 25.5%
  branch_misses      19.0M  ± 37.5K     18.9M  … 19.1M           1 ( 8%)          -  0.5% ±  0.2%

@folkertdev folkertdev merged commit 221cdd7 into trifectatechfoundation:main Jan 9, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants