-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement time-domain convolution and use it for integers #545
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #545 +/- ##
==========================================
+ Coverage 97.81% 97.83% +0.01%
==========================================
Files 18 19 +1
Lines 3204 3226 +22
==========================================
+ Hits 3134 3156 +22
Misses 70 70 ☔ View full report in Codecov by Sentry. |
Should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe Val(N)
would help with type stability
It does for |
Hm. Export |
I kind of like the names |
68c20b4
to
3bb56ca
Compare
I've implemented the keyword-argument-based algorithm selection. I'm not entirely sure I like it, but the first impression is okayish. The |
I took reference to FastConv.jl (licensed under the "MIT License (Expat)") and wrote a version with an explicit loop over function _tdconv1!(out, E::AbstractArray{<:Number,N}, k::AbstractArray{<:Number,N}) where {N}
output_indices = CartesianIndices(ntuple(i -> 1:(size(E, i)+size(k, i)-1), Val(N)))
checkbounds(out, output_indices)
fill!(out, 0)
offset = CartesianIndex(ntuple(_ -> 1, Val(N)))
for j in CartesianIndices(E), i in CartesianIndices(k)
out[i+j-offset] = muladd(E[j], k[i], out[i+j-offset])
end
return out
end Benchmarks: julia> x = rand(100); y = rand(100); out = rand(199);
julia> @benchmark _conv_td!($out, $x, $y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 9.000 μs … 38.000 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.100 μs ┊ GC (median): 0.00%
Time (mean ± σ): 9.216 μs ± 602.871 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ █ ▃ ▂
▅▁▁▁█▁▁▁▁█▁▁▁▁█▁▁▁▁▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁█▁▁▁█ █
9 μs Histogram: log(frequency) by time 10.2 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark _tdconv1!($out, $x, $y)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.510 μs … 5.720 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.520 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.583 μs ± 101.580 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▂
▄▁▁█▁▁▃▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▄▁▁█▁▁▂ ▂
1.51 μs Histogram: frequency by time 1.69 μs <
Memory estimate: 0 bytes, allocs estimate: 0. |
Yep. I used the generator-based version because it allowed to derive the output type based on the actual values. With the latest change from |
If you're referring to the somewhat messy organization in |
I was primarily referring to the keyword-based API. Once that's settled, the organization of the So let's discuss API. Some random thoughts I have gathered while working on this:
I think the current kwargs approach addresses the first three items, although choosing between |
My opinion, for what it's worth:
|
I'm no fan of Otherwise, this is slowly getting shape -- partially in code, partially in my head for now. One thing I haven't made up mind about is how
The two cases are inconsistent with each other for the input-indices-start-at-1 case, but each one individually is very much the only reasonable choice, so that's where we are. Now if the output provided to Of course, erroring for unclear cases would open the possibility to decide later on, so that might be the safest route for now. |
Sounds good. Another thought I had regarding commutativity and axes, while reading this after prematurely micro-optimizing the loop: even if we consider
I also just realized that |
addb53b
to
fce6cf9
Compare
These all seem like great changes! |
This turns out to be a but more of a rabbit hole than I had anticipated, but it's slowly getting into shape. Apart from some more tests and documentation, the main to-do appears to be a better heuristic to choose the fastest algorithm. But the working of the
The default for Another alternative would be to replace the |
I think allowing the user to choose the algorithm would be better. In my previous attempts to add direct convolution to this and figure out which alg to choose, I found it very hard to come up with heuristics since dimensionality, size of the problem, element type, number of threads available, and architecture (eg avx instructions available) all come into play. Doing a best effort selection and letting users who really care override seems better imo. |
That would then include the option the choose between simple FFT and overlap-save, right? (I had this in an earlier iteration, can revive.) |
I agree with giving users the ability to choose specific algorithms. Maybe we could have |
I think the second option with |
Do you mean @wheeheee I don't think allowing different kwargs in |
Yes, that's what I meant. |
Another pair of eyes proof-reading the added warning to the docstring would be welcome. Then, this should be ready to (squash!)-merge. |
Just a side note that from 1.10 -> 1.11 the tests take something like 0.8x more time, which imo is a pretty serious performance regression. Can't be certain but probably not this PR's fault, since it's present in all sections. Locally, it's not that bad, but still slightly worse. |
Julia 1.11.0 had some compile time (and other latency) regressions, which apparently have only been partially fixed in 1.11.1. I'm inclined to blame those, given that the tests are typically pretty compile-heavy. So with some luck, things will improve with upcoming Julia versions. If not, we should inspect this more closely. |
src/dspbase.jl
Outdated
calc_index_offset(ao::Base.OneTo, au::Base.OneTo, av::Base.OneTo) = 1 | ||
calc_index_offset(ao::Base.OneTo, au::AbstractUnitRange, av::AbstractUnitRange) = # first(au) + first(av) - 1 | ||
throw(ArgumentError("output must have offset axes if the input has")) | ||
calc_index_offset(ao::AbstractUnitRange, au::Base.OneTo, av::Base.OneTo) = # 2 | ||
throw(ArgumentError("output must not have offset axes if none of the inputs has")) | ||
calc_index_offset(ao::AbstractUnitRange, au::AbstractUnitRange, av::AbstractUnitRange) = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran into some trouble while testing whether the PR would partially resolve #497. Briefly, this part throws an ArgumentError for SOneTo
axes, although they are not offset axes. Should we consider using Base.has_offset_axes
for this part instead of testing whether axes are OneTo
?
edit: ok, I guess this would mean a 1-based OffsetArray would be treated like an Array. Maybe there could be more documentation on what is meant by "offset axes"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that is unfortunate. I'll think about it while I'm away from the computer for the rest of the week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In brief, the problem manifests itself as:
conv(@SVector([1,1]), @SVector([1,2,3]))
ERROR: MethodError: no method matching similar(::SVector{2, Int64}, ::Type{Int64}, ::Tuple{UnitRange{Int64}})
or, after using OffsetArrays
, as
julia> conv(@SVector([1,1]), @SVector([1,2,3]))
4-element OffsetArray(::Vector{Int64}, 2:5) with eltype Int64 with indices 2:5:
1
3
5
3
which may also come unexpected (note the offset axis). The result one would actually desire here is
4-element SVector{4, Int64} with indices SOneTo(4):
1
3
5
4
This makes me really skeptical of the approach of treating anything unknown as having an offset, as that may lock us into unwanted behavior to avoid breakage later on.
So my current thought is: Bail on anything not OneTo
. But do so via a function to which additional methods can be added to specify the desired behavior. Then we have the option of providing package extensions e.g. for StaticArrays
making SOneTo
behave in the no-offset way and for OffsetArrays
making IdOffsetRange
do the offset thing. Yes, that means any unsupported axis type will yield an error. But I'd prefer that to doing something arbitrary to later find out that something else would have been better. I think I will try that approach with StaticArrays
and OffsetArrays
as trial balloons to see how it feels. Opinions so far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first draft of the extension-based implementation looks promising to me. Especially for StaticArrays
, that is probably the only way to get it type-stable. Downside is that this won't work for Julia older than 1.9. But maybe we want to stop supporting anything before 1.10 anyways?
Okay, pushed a version utilizing package extensions. The |
0405374
to
879b409
Compare
Sounds good! Not familiar with package extensions, so maybe it would be better if there's another reviewer... |
* add `:algorithm` kwarg * add `conv!`
Co-authored-by: wheeheee <[email protected]>
Co-authored-by: wheeheee <[email protected]>
879b409
to
34f42b2
Compare
This does the very minimal thing to get discussion started. I think we should do the time-domain convolution in more cases, but before going in that direction, I'd first like to get feedback.
In particular, for "small"
u
andv
, time-domain convolution would be more efficient and I see no reason not to do it. It just takes a bit of benchmarking to figure out what "small" should be exactly, here. And it would also fix this minor issue:The time-domain convolution form this PR does the right thing:
Another point worth considering is whether to use time-domain convolution for more input types. Consider
For
BigFloat
, this would change from throwing an error to working, but potentially slow, which I'd consider an improvement. ForRational
, I wonder whether doing the computation onRational
s (and also returningRational
s) wouldn't be better. The extreme measure would be to do time-domain convolution by default and only use FFT-based approaches forFloat32
andFloat64
input (above a certain problem size), which would still cover most cases in practice, I guess. Opinions?Note that if we decide to do time-domain convolution in more cases, we can always do that in later PRs; this one could stand on its own.
Closes #411, fixes #410.