Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RTL SWG] Support SIMD < C in window-parallel mode #922

Merged
merged 3 commits into from
Jan 8, 2024

Conversation

fpjentzsch
Copy link
Collaborator

@fpjentzsch fpjentzsch commented Nov 20, 2023

Previously, full SWG SIMD parallelism (SIMD = # Channels) was required before enabling the window-parallel mode. Due to the depthwise data layout, this prevented VVAU SIMD unfolding (across the kernel dimensions) unless VVAU PE (across the channel dimension) was maxed out.

This adds support for SWG SIMD < C when the SWG is in window-parallel and depthwise mode.
Note that the SIMD of the SWG must match the PE of the following VVAU.
VVAU SIMD < K is supported via a normal DWC, which is inserted automatically by the compiler.

This experimental HLS DWC component was previously introduced as a workaround for this problem and should now be obsolete: Xilinx/finn-hlslib#134

SWG SIMD < C is also allowed in the 1x1 kernel case (no matter whether parallel_window is set or not), which should fix #895.

@auphelia auphelia requested a review from mmrahorovic November 21, 2023 09:24
@maltanar
Copy link
Collaborator

maltanar commented Nov 26, 2023

Many thanks for this feature @fpjentzsch ! Not a full review but while testing this on a larger network, I did come across one issue during FIFO sizing with verilator. Although not part of the changes from this PR itself, I think the kinds of SWGs enabled by the PR are more likely to run into this issue: the DEPTH of the swg_reg_buffer is sometimes large enough to run into particular verilator coding style limitations.

Specifically, this is the error message I observed:

%Error-BLKLOOPINIT: /scratch/finn/verilator_fifosim_iletty6g/finn_design_wrapper.v:3250:17: Unsupported: Delayed assignment to array inside for loops (non-delayed is ok - see docs)
 3250 |             Data[i] <= Data[i-1];

The offending piece of code in context below - the DEPTH of the swg_reg_buffer instances in this example can be as large as 832 and this seems to be larger than the verilator default limits for loop unrolling in this scenario.

image

I'll give this a try with --unroll-count 1000 to see if it makes a difference for my use-case, but perhaps there is a better solution here, either by changing the coding style or getting the FINN compiler to generate code with those statements unrolled. (update: that did not help, unfortunately)

@maltanar
Copy link
Collaborator

A suggestion from @preusser (which verilator seems to be happy with) is to use a sliced vector assignment instead of the for-loop:

if(shift_enable) begin
  if(DEPTH > 1)  Data[DEPTH-1:1] <= Data[DEPTH-2:0];
  Data[0] <= shift_in;
end

@fpjentzsch
Copy link
Collaborator Author

Thanks @maltanar, I incorporated this fix and, from my side, we could merge this PR already.

To increase resource efficiency in cases like this, I'm currently experimenting with a "depth threshold" setting, which would split up deep shift registers where not all elements need to be accessed in parallel (such as for large dilation_w or large #Channels/SIMD) into smaller shift registers and LUTRAM/BRAM buffers. I prefer to avoid yet another configurable attribute for the custom_op, so I'm doing some benchmarking in an attempt to find a reasonable value for this threshold and the overall resource impact that this could have.

@preusser preusser requested review from preusser and auphelia January 5, 2024 09:54
Copy link
Collaborator

@preusser preusser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @fpjentzsch!

@auphelia auphelia merged commit e3cb226 into Xilinx:dev Jan 8, 2024
2 checks passed
@mmrahorovic mmrahorovic mentioned this pull request Mar 4, 2024
19 tasks
@mmrahorovic mmrahorovic mentioned this pull request Mar 14, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants