-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce allocations for IIR filtering #614
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #614 +/- ##
=======================================
Coverage 97.96% 97.97%
=======================================
Files 19 19
Lines 3248 3252 +4
=======================================
+ Hits 3182 3186 +4
Misses 66 66 ☔ View full report in Codecov by Sentry. |
For some other combinations of lengths there seem to be regressions, although idk how significant. Benchmarksjulia> @benchmark filt!(out, b, a, x) setup=((b, a) = (rand(rand(2:30)), [1.0; rand(rand(1:29))]); x = rand(100); out = similar(x))
BenchmarkTools.Trial: 10000 samples with 204 evaluations. # master
Range (min … max): 375.490 ns … 23.085 μs ┊ GC (min … max): 0.00% … 94.88%
Time (median): 891.667 ns ┊ GC (median): 0.00%
Time (mean ± σ): 909.695 ns ± 620.143 ns ┊ GC (mean ± σ): 3.23% ± 5.87%
▄▄█▃▁▁ ▁
▂▂▂▂▂▂▃▃▂▃▁▂▃▄▄▄▃▃▄▄████████▄▂▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
375 ns Histogram: frequency by time 1.79 μs <
Memory estimate: 64 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 213 evaluations. # PR
Range (min … max): 345.540 ns … 35.139 μs ┊ GC (min … max): 0.00% … 96.92%
Time (median): 916.432 ns ┊ GC (median): 0.00%
Time (mean ± σ): 919.951 ns ± 449.703 ns ┊ GC (mean ± σ): 1.04% ± 2.85%
▁▄▃▃▅▇▆▇▇█▆▄▅▃▃▂▁▁
▂▃▂▂▂▂▂▂▂▃▃▄▄▄▄▄▄▄▄▄▅▅▅▆▇████████████████████▆▆▆▅▅▄▄▃▃▃▃▂▂▂▂▂ ▄
346 ns Histogram: frequency by time 1.39 μs <
Memory estimate: 64 bytes, allocs estimate: 2.
julia> @benchmark filt!(out, b, a, x) setup=((b, a) = (rand(rand(2:30)), [1.0; rand(rand(1:29))]); x = rand(1_000); out = similar(x))
BenchmarkTools.Trial: 10000 samples with 8 evaluations. # master
Range (min … max): 3.487 μs … 826.413 μs ┊ GC (min … max): 0.00% … 98.32%
Time (median): 8.488 μs ┊ GC (median): 0.00%
Time (mean ± σ): 8.397 μs ± 9.362 μs ┊ GC (mean ± σ): 1.45% ± 1.38%
▄▅█▇▇▅▄▂▂▃▂
▂▃▂▃▂▃▂▂▄▂▃▂▃▃▂▄▂▂▄▅▃▃▅▃▅▃▃▄███████████▆▆▆▅▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁ ▃
3.49 μs Histogram: frequency by time 12.7 μs <
Memory estimate: 64 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 8 evaluations. # PR
Range (min … max): 3.225 μs … 29.587 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.825 μs ┊ GC (median): 0.00%
Time (mean ± σ): 8.899 μs ± 2.204 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂▄▆█▆█▆▆▆▅▄▂▂▁
▂▃▂▃▂▃▂▃▄▄▄▄▃▄▄▅▄▄▄▅▆▇█████████████████▇▇▆▆▅▅▄▄▄▃▄▃▃▃▃▃▃▂▂ ▄
3.22 μs Histogram: frequency by time 14.1 μs <
Memory estimate: 64 bytes, allocs estimate: 2.
julia> @benchmark filt!(out, b, a, x) setup=((b, a) = (rand(rand(2:30)), [1.0; rand(rand(1:29))]); x = rand(10_000); out = similar(x))
BenchmarkTools.Trial: 10000 samples with 1 evaluation. # master
Range (min … max): 34.900 μs … 399.500 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 84.300 μs ┊ GC (median): 0.00%
Time (mean ± σ): 84.039 μs ± 20.555 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▄▄▁▁ ▁
▄▂▃▁▂▁▃▁▃▂▁▄▃▂▂▂▄▄▂▄▂▄▂▄▅█▆████▇█▄▆▇▄▅▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁ ▃
34.9 μs Histogram: frequency by time 138 μs <
Memory estimate: 64 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation. # PR
Range (min … max): 32.000 μs … 206.700 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 90.100 μs ┊ GC (median): 0.00%
Time (mean ± σ): 91.892 μs ± 24.194 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▂▄▆█▆█▆▆▅▅▃▄▄▃▄▂▂▁▁
▃▃▂▂▃▃▃▃▄▄▃▄▄▄▅▄▅▅▆████████████████████▇▇▆▆▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂ ▄
32 μs Histogram: frequency by time 154 μs <
Memory estimate: 64 bytes, allocs estimate: 2. |
6a1f2ef
to
878281a
Compare
Hm, I've tried with the same benchmarks and the results are somewhat hard to interpret, as it's close to measurement noise, but to me it looks like this:
All of that might be dominated by randomness, though. Anyway, I've removed the rewrite to views to have this more focused. Can you re-do the benchmarks and see whether it's now an improvement for you, too? |
Hm, not much difference without the views (only tried 1.11.2 for this comparison). If it matters, I gave them each a warmup run and afterwards they were pretty consistent. It might be machine-dependent, but I found an example that consistently shows a difference for me. julia> @benchmark filt!(out, b, a, x) setup=((b, a) = (rand(15), [1.0; rand(29)]); x = rand(10_000); out = similar(x))
BenchmarkTools.Trial: 10000 samples with 1 evaluation. # master
Range (min … max): 87.700 μs … 280.400 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 102.500 μs ┊ GC (median): 0.00%
Time (mean ± σ): 106.822 μs ± 14.591 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▅█
█▆▂▂▃▄▄███▇▆▄▅▆▅▅▄▄▃▃▂▃▄▄▄▄▃▆▄▅▄▃▃▃▃▂▃▄▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
87.7 μs Histogram: frequency by time 150 μs <
Memory estimate: 592 bytes, allocs estimate: 4.
BenchmarkTools.Trial: 10000 samples with 1 evaluation. # PR
Range (min … max): 115.000 μs … 436.600 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 140.700 μs ┊ GC (median): 0.00%
Time (mean ± σ): 146.303 μs ± 16.144 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▂▂▁▃▃▄█▅▆▄▃▂▁▃▃▂▂▂▃▂▃▂▁▁▁▁▂▂▂▂▂ ▂
█▅▆█▇▇▇▇▆███████████████████████████████████▇▇▇▇▇▆▆▆▆▅▅▅▅▅▅▄▆ █
115 μs Histogram: log(frequency) by time 200 μs <
Memory estimate: 288 bytes, allocs estimate: 2. OTOH it isn't so one-sided, for other coefficient lengths, the PR is better. So I guess there are just some tradeoffs to be made. julia> @btime filt!(out, b, a, x) setup=((b, a) = (rand(5), [1.0; rand(29)]); x = rand(10_000); out = similar(x));
87.800 μs (4 allocations: 592 bytes) # master
76.800 μs (2 allocations: 288 bytes) # PR EDIT: similar results on 1.10.7 |
So here's my hypothesis to what's going on: The run-time of the loops only scales approximately linear wrt. iteration count, depending on how well the iteration count matches the vectorization width. So if Overall, my feeling is that we shouldn't over-value these benchmarks. Users who really want to squeeze the last bits of performance out of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Then I guess this is good to go.
@wheeheee I'm not too excited of the commit you've pushed. It accesses internals of |
Yeah, no problem. Just thought it might be helpful, feel free to change. |
e3f404b
to
878281a
Compare
Avoid the necessity to re-allocate one of the coefficient vectors if they have different lengths by letting
_filt_iir!
handle different lengths.(The second commit is some performance-neutral clean-up. I'm not 100% sure we should do it.)For equal-length coefficient vectors performance is practically the same: