Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations for IIR filtering #614

Merged
merged 1 commit into from
Jan 9, 2025
Merged

Conversation

martinholters
Copy link
Member

@martinholters martinholters commented Dec 16, 2024

Avoid the necessity to re-allocate one of the coefficient vectors if they have different lengths by letting _filt_iir! handle different lengths.
(The second commit is some performance-neutral clean-up. I'm not 100% sure we should do it.)

For equal-length coefficient vectors performance is practically the same:

julia> @btime filt!($(zeros(100)), $(rand(2)), $([1; rand(1)]), $(rand(100))); # a and b both length 2
  344.546 ns (1 allocation: 64 bytes) # PR
  348.047 ns (1 allocation: 64 bytes) # master

julia> @btime filt!($(zeros(100)), $(rand(3)), $([1; rand(2)]), $(rand(100))); # a and b both length 3
  319.090 ns (1 allocation: 80 bytes) # PR
  317.876 ns (1 allocation: 80 bytes) # master

julia> @btime filt!($(zeros(100)), $(rand(2)), $([1; rand(2)]), $(rand(100)));
  294.293 ns (1 allocation: 80 bytes) # PR
  330.063 ns (2 allocations: 160 bytes) # master

julia> @btime filt!($(zeros(100)), $(rand(3)), $([1; rand(1)]), $(rand(100)));
  285.819 ns (1 allocation: 80 bytes) # PR
  332.861 ns (2 allocations: 160 bytes) # master

julia> @btime filt!($(zeros(100)), $(rand(100)), $([1; rand(1)]), $(rand(100)));
  1.247 μs (1 allocation: 896 bytes) # PR
  1.495 μs (2 allocations: 1.75 KiB) # master

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.97%. Comparing base (3e79857) to head (e3f404b).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #614   +/-   ##
=======================================
  Coverage   97.96%   97.97%           
=======================================
  Files          19       19           
  Lines        3248     3252    +4     
=======================================
+ Hits         3182     3186    +4     
  Misses         66       66           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wheeheee
Copy link
Member

For some other combinations of lengths there seem to be regressions, although idk how significant.

Benchmarks
julia> @benchmark filt!(out, b, a, x) setup=((b, a) = (rand(rand(2:30)), [1.0; rand(rand(1:29))]); x = rand(100); out = similar(x))
BenchmarkTools.Trial: 10000 samples with 204 evaluations.   # master
 Range (min  max):  375.490 ns   23.085 μs  ┊ GC (min  max): 0.00%  94.88%
 Time  (median):     891.667 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   909.695 ns ± 620.143 ns  ┊ GC (mean ± σ):  3.23% ±  5.87%

                      ▄▄█▃▁▁ ▁
  ▂▂▂▂▂▂▃▃▂▃▁▂▃▄▄▄▃▃▄▄████████▄▂▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  375 ns           Histogram: frequency by time         1.79 μs <

 Memory estimate: 64 bytes, allocs estimate: 2.

BenchmarkTools.Trial: 10000 samples with 213 evaluations.   # PR
 Range (min  max):  345.540 ns   35.139 μs  ┊ GC (min  max): 0.00%  96.92%
 Time  (median):     916.432 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   919.951 ns ± 449.703 ns  ┊ GC (mean ± σ):  1.04% ±  2.85%

                           ▁▄▃▃▅▇▆▇▇█▆▄▅▃▃▂▁▁
  ▂▃▂▂▂▂▂▂▂▃▃▄▄▄▄▄▄▄▄▄▅▅▅▆▇████████████████████▆▆▆▅▅▄▄▃▃▃▃▂▂▂▂▂ ▄
  346 ns           Histogram: frequency by time         1.39 μs <

 Memory estimate: 64 bytes, allocs estimate: 2.

julia> @benchmark filt!(out, b, a, x) setup=((b, a) = (rand(rand(2:30)), [1.0; rand(rand(1:29))]); x = rand(1_000); out = similar(x))
BenchmarkTools.Trial: 10000 samples with 8 evaluations.  # master
 Range (min  max):  3.487 μs  826.413 μs  ┊ GC (min  max): 0.00%  98.32%
 Time  (median):     8.488 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.397 μs ±   9.362 μs  ┊ GC (mean ± σ):  1.45% ±  1.38%

                              ▄▅█▇▇▅▄▂▂▃▂
  ▂▃▂▃▂▃▂▂▄▂▃▂▃▃▂▄▂▂▄▅▃▃▅▃▅▃▃▄███████████▆▆▆▅▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁ ▃
  3.49 μs         Histogram: frequency by time        12.7 μs <

 Memory estimate: 64 bytes, allocs estimate: 2.

BenchmarkTools.Trial: 10000 samples with 8 evaluations.  # PR
 Range (min  max):  3.225 μs  29.587 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     8.825 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.899 μs ±  2.204 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                        ▁▂▄▆█▆█▆▆▆▅▄▂▂▁
  ▂▃▂▃▂▃▂▃▄▄▄▄▃▄▄▅▄▄▄▅▆▇█████████████████▇▇▆▆▅▅▄▄▄▃▄▃▃▃▃▃▃▂▂ ▄
  3.22 μs        Histogram: frequency by time        14.1 μs <

 Memory estimate: 64 bytes, allocs estimate: 2.

julia> @benchmark filt!(out, b, a, x) setup=((b, a) = (rand(rand(2:30)), [1.0; rand(rand(1:29))]); x = rand(10_000); out = similar(x))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.  # master
 Range (min  max):  34.900 μs  399.500 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     84.300 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   84.039 μs ±  20.555 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                           █ ▄▄▁▁ ▁
  ▄▂▃▁▂▁▃▁▃▂▁▄▃▂▂▂▄▄▂▄▂▄▂▄▅█▆████▇█▄▆▇▄▅▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁ ▃
  34.9 μs         Histogram: frequency by time          138 μs <

 Memory estimate: 64 bytes, allocs estimate: 2.

BenchmarkTools.Trial: 10000 samples with 1 evaluation.  # PR
 Range (min  max):  32.000 μs  206.700 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     90.100 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   91.892 μs ±  24.194 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▂▂▄▆█▆█▆▆▅▅▃▄▄▃▄▂▂▁▁
  ▃▃▂▂▃▃▃▃▄▄▃▄▄▄▅▄▅▅▆████████████████████▇▇▆▆▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂ ▄
  32 μs           Histogram: frequency by time          154 μs <

 Memory estimate: 64 bytes, allocs estimate: 2.

src/Filters/filt.jl Outdated Show resolved Hide resolved
src/dspbase.jl Outdated Show resolved Hide resolved
@martinholters
Copy link
Member Author

Hm, I've tried with the same benchmarks and the results are somewhat hard to interpret, as it's close to measurement noise, but to me it looks like this:

  • On Julia 1.10.7, the PR is slightly better than master.
  • On Julia 1.11.2, master is slightly better than this PR.
  • But not doing the views thing, the PR is slightly better on Julia 1.11.2, too.

All of that might be dominated by randomness, though. Anyway, I've removed the rewrite to views to have this more focused. Can you re-do the benchmarks and see whether it's now an improvement for you, too?

@wheeheee
Copy link
Member

wheeheee commented Dec 17, 2024

Hm, not much difference without the views (only tried 1.11.2 for this comparison). If it matters, I gave them each a warmup run and afterwards they were pretty consistent. It might be machine-dependent, but I found an example that consistently shows a difference for me.

julia> @benchmark filt!(out, b, a, x) setup=((b, a) = (rand(15), [1.0; rand(29)]); x = rand(10_000); out = similar(x))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.  # master
 Range (min  max):   87.700 μs  280.400 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     102.500 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   106.822 μs ±  14.591 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂      ▅█
  █▆▂▂▃▄▄███▇▆▄▅▆▅▅▄▄▃▃▂▃▄▄▄▄▃▆▄▅▄▃▃▃▃▂▃▄▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  87.7 μs          Histogram: frequency by time          150 μs <

 Memory estimate: 592 bytes, allocs estimate: 4.

BenchmarkTools.Trial: 10000 samples with 1 evaluation.  # PR
 Range (min  max):  115.000 μs  436.600 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     140.700 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   146.303 μs ±  16.144 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂          ▂▂▁▃▃▄█▅▆▄▃▂▁▃▃▂▂▂▃▂▃▂▁▁▁▁▂▂▂▂▂                    ▂
  █▅▆█▇▇▇▇▆███████████████████████████████████▇▇▇▇▇▆▆▆▆▅▅▅▅▅▅▄▆ █
  115 μs        Histogram: log(frequency) by time        200 μs <

 Memory estimate: 288 bytes, allocs estimate: 2.

OTOH it isn't so one-sided, for other coefficient lengths, the PR is better. So I guess there are just some tradeoffs to be made.

julia> @btime filt!(out, b, a, x) setup=((b, a) = (rand(5), [1.0; rand(29)]); x = rand(10_000); out = similar(x));
  87.800 μs (4 allocations: 592 bytes)  # master
  76.800 μs (2 allocations: 288 bytes)  # PR

EDIT: similar results on 1.10.7

@martinholters
Copy link
Member Author

So here's my hypothesis to what's going on: The run-time of the loops only scales approximately linear wrt. iteration count, depending on how well the iteration count matches the vectorization width. So if max(length(a), length(b)) (before any padding) is a "good" length, but min(length(a), length(b)) is a "bad" length (and their difference is a "bad" length for the second loop, too), then this PR can lead to inferior performance.
This also seems to be system specific (not unexpected if my suspicion is correct), as for the example you have provided, I only see about 10% run-time increase compared to your 40%.

Overall, my feeling is that we shouldn't over-value these benchmarks. Users who really want to squeeze the last bits of performance out of filt! on their specific system can always just zero-pad their coefficients until they reach an optimum. For your example, I see improved performance replacing b with [rand(15); zeros(2)]. YMMV.

Copy link
Member

@wheeheee wheeheee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Then I guess this is good to go.

@martinholters
Copy link
Member Author

@wheeheee I'm not too excited of the commit you've pushed. It accesses internals of LaurentPolynomial which AFAICT are non-public/undocumented, and the gain is not that much. Are you ok with me undoing that and then merging?

@wheeheee
Copy link
Member

wheeheee commented Jan 9, 2025

Yeah, no problem. Just thought it might be helpful, feel free to change.

@martinholters martinholters merged commit 73d58fb into master Jan 9, 2025
19 checks passed
@martinholters martinholters deleted the mh/improve-_filt_iir branch January 9, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants