-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate output of decoding #45
Conversation
This introduces a serious performance regression for decode: $ cargo bench --bench="*" -- --baseline=perf2 Compiling polyline v0.10.2 (/Users/mkirk/src/georust/polyline) Finished `bench` profile [optimized] target(s) in 0.77s Running benches/benchmarks.rs (target/release/deps/benchmarks-cc8f3ea04be06866) encode 10_000 coordinates at precision 1e-5 time: [105.25 µs 105.30 µs 105.37 µs] change: [-0.1263% -0.0235% +0.0878%] (p = 0.68 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe encode 10_000 coordinates at precision 1e-6 time: [129.27 µs 130.11 µs 130.80 µs] change: [-0.9966% -0.2021% +0.6397%] (p = 0.63 > 0.05) No change in performance detected. decode 10_000 coordinates at precision 1e-5 time: [164.64 µs 165.83 µs 167.02 µs] change: [+109.57% +111.90% +114.24%] (p = 0.00 < 0.05) Performance has regressed. decode 10_000 coordinates at precision 1e-6 time: [173.15 µs 174.21 µs 175.17 µs] change: [+85.462% +87.344% +89.230%] (p = 0.00 < 0.05) Performance has regressed. Co-authored-by: mattiZed <[email protected]>
It's still slower than before the validation checks, but now only 10-20% slower rather than 90-110% slower $ cargo bench --bench="*" -- --baseline=perf2 Compiling polyline v0.10.2 (/Users/mkirk/src/georust/polyline) Finished `bench` profile [optimized] target(s) in 0.77s Running benches/benchmarks.rs (target/release/deps/benchmarks-cc8f3ea04be06866) encode 10_000 coordinates at precision 1e-5 time: [105.11 µs 105.16 µs 105.23 µs] change: [-0.2235% -0.1070% +0.0226%] (p = 0.09 > 0.05) No change in performance detected. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe encode 10_000 coordinates at precision 1e-6 time: [123.98 µs 124.32 µs 124.77 µs] change: [-3.0982% -2.4787% -1.8225%] (p = 0.00 < 0.05) Performance has improved. decode 10_000 coordinates at precision 1e-5 time: [86.887 µs 87.835 µs 88.768 µs] change: [+10.484% +11.820% +13.207%] (p = 0.00 < 0.05) Performance has regressed. decode 10_000 coordinates at precision 1e-6 time: [110.52 µs 111.44 µs 112.35 µs] change: [+19.484% +20.773% +22.034%] (p = 0.00 < 0.05) Performance has regressed.
7c1dd92
to
c57292a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that this is touchy – there's nothing obviously "weird" in the new code. I'm not at my machine, and in any case it's a similar setup to yours iirc, so I can't confirm.
Here's an example of the touchiness: One tiny change to interpolate an existing var into the err message. We're not ever hitting this branch at runtime, so presumably it has some outsized effects on... locality... or register use.. ? 🤷 (surely not a compiler bug... right???). Whatever it is, I can reliably reproduce it with this change. And there were a couple other small changes which you'd think innocuous, but tanked performance. So I had to be very careful while making these changes.
I really don't understand it, but I don't understand a lot of things. |
update: merged!
Depends on #43, so please review that first.Supersedes #40
FIXES #39 and #37 with less performance regression.
@mattiZed - Thanks for the inspiration - I've incorporated some of your code into 0d827dd, but I've omitted the LUT stuff for now. I wasn't able to convince myself that that part was an improvement yet.
I will note: some of this code is super touchy. Tiny, seemingly irrelevant changes, would result in 50% swings in bench performance. Though my benchmark setup is crude, I've run the benchmarks back and forth enough to convince myself that this is not just noise.