-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RTL SWG] Support SIMD < C in window-parallel mode #922
Conversation
Many thanks for this feature @fpjentzsch ! Not a full review but while testing this on a larger network, I did come across one issue during FIFO sizing with verilator. Although not part of the changes from this PR itself, I think the kinds of SWGs enabled by the PR are more likely to run into this issue: the Specifically, this is the error message I observed:
The offending piece of code in context below - the I'll give this a try with |
A suggestion from @preusser (which verilator seems to be happy with) is to use a sliced vector assignment instead of the for-loop:
|
Thanks @maltanar, I incorporated this fix and, from my side, we could merge this PR already. To increase resource efficiency in cases like this, I'm currently experimenting with a "depth threshold" setting, which would split up deep shift registers where not all elements need to be accessed in parallel (such as for large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @fpjentzsch!
Previously, full SWG SIMD parallelism (SIMD = # Channels) was required before enabling the window-parallel mode. Due to the depthwise data layout, this prevented VVAU SIMD unfolding (across the kernel dimensions) unless VVAU PE (across the channel dimension) was maxed out.
This adds support for SWG SIMD < C when the SWG is in window-parallel and depthwise mode.
Note that the SIMD of the SWG must match the PE of the following VVAU.
VVAU SIMD < K is supported via a normal DWC, which is inserted automatically by the compiler.
This experimental HLS DWC component was previously introduced as a workaround for this problem and should now be obsolete: Xilinx/finn-hlslib#134
SWG SIMD < C is also allowed in the 1x1 kernel case (no matter whether
parallel_window
is set or not), which should fix #895.