-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analytical FIFO sizing #1185
base: dev
Are you sure you want to change the base?
Analytical FIFO sizing #1185
Conversation
…tgenerator, fmpadding, labelselect, matrixvectoractivation, pool, streamingdatawidthconverter (generalized variant, very conservative estimate), streamingmaxpool, thresholding, vectorvectoractivation
Signed-off-by: lstasytis <[email protected]>
Signed-off-by: lstasytis <[email protected]>
Signed-off-by: lstasytis <[email protected]>
Signed-off-by: lstasytis <[email protected]>
…ing and specific cases of streamingmaxpool and slidingwindow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not run the tests for thresholding & vvau yet.
In general we probably want to reduce the number of tests that go into the default suite. Currently the most test cases are located in
- ConvInpGen: 20.736
- Thresholding: 4.608
- VVAU: 2.304
Maybe we can still include an extended test suite via pytest.fixture or pytest_generate_tests functions @auphelia?
@@ -166,12 +166,17 @@ def get_verilog_top_module_intf_names(self): | |||
) | |||
return intf_names | |||
|
|||
def derive_characteristic_fxns(self, period): | |||
def derive_characteristic_fxns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring also needs to be applied to addstreams.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeriveFIFOSizes expects outFIFODepths
to be set for every output of every CustomOp. This is problematic in the current implementation of DuplicateStreams, as the default value of [2] (=1 output with FIFO depth set to 2) is inherited from HWCustomOp, even though DuplicateStreams has NumOutputStreams
outputs.
How/where do we set the default value of one attribute depending on another attribute without breaking anything? In verify_node()?
code_gen_dir = self.get_nodeattr("code_gen_dir_ipgen") | ||
# ensure that there is a directory | ||
if code_gen_dir == "" or not os.path.isdir(code_gen_dir): | ||
code_gen_dir = make_build_dir(prefix="code_gen_ipgen_" + str(self.name) + "_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be self.onnx_node.name
instead of self.name
# call the compilation function for this node | ||
self.ipgen_singlenode_code() | ||
else: | ||
warnings.warn("Using pre-existing IP for %s" % self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be self.onnx_node.name
instead of self.name
self.code_generation_ipgen(model, fpga_part, clk_period) | ||
|
||
# lazy construction of hlssynthip step | ||
if is_hls_node(node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need an else branch for this if to support RTL CustomOps? E.g.:
if is_hls_node(node): | |
if is_hls_node(node): | |
... | |
else: | |
self.prepare_rtlsim() | |
# ensure that executable path is now set | |
assert ( | |
self.get_nodeattr("rtlsim_so") != "" | |
), "Failed to prepare RTLSim, no rtlsim_so attribute found." |
if mem_mode in ["internal_decoupled", "external"]: | ||
n_weight_inps = self.calc_wmem() | ||
# num_w_reps = np.prod(self.get_nodeattr("numInputVectors")) | ||
io_dict["inputs"]["weights"] = [0 for i in range(1 * n_weight_inps)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why num_w_reps
is no longer considered here. Could you explain?
For my experiments I switched to MVAUs with mem_mode "internal_embedded" instead of "internal_decoupled" for now to avoid potential problems.
"largefifo_rtlsim_cpp", | ||
"characterize_analytic", | ||
"characterize_rtl", | ||
], | ||
) | ||
@pytest.mark.parametrize("topology", ["tfc", "cnv"]) | ||
def test_fifosizing_linear(method, topology): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After incorporating some of my other comments, this test fails for me in the cases [cnv, characterize_analytic] and [cnv, characterize_rtl] during "step_set_fifo_depths" with
Exception: Error in simulation! Takes too long to produce output. Consider setting the LIVENESS_THRESHOLD env.var. to a larger value.
and
%Warning: ./MVAU_hls_6_Matrix_Vector_Activate_Stream_Batch_p_ZL7threshs_0_ROM_AUTO_1R.dat:0: $readmem file not found
I don't have an explicit LIVENESS_THRESHOLD set, so it should default to 10k.
Does this PR depend on the new auto-folding PR or other PRs?
Could the "internal_decoupled" mode be the culprit (see my other comment)?
@pytest.mark.fpgadataflow | ||
@pytest.mark.slow | ||
@pytest.mark.vivado | ||
def test_fpgadataflow_analytical_characterization_slidingwindow( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are 20.736 test cases. I only ran 20% of them and over 1.000 failed.
@pytest.mark.fpgadataflow | ||
@pytest.mark.slow | ||
@pytest.mark.vivado | ||
def test_fpgadataflow_analytical_characterization_fmpadding( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me 32/256 tests failed.
@pytest.mark.fpgadataflow | ||
@pytest.mark.slow | ||
@pytest.mark.vivado | ||
def test_fpgadataflow_analytical_characterization_streamingmaxpool( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me 20/128 tests failed
FIFO sizing is an extremely time-consuming process in terms of CPU cycles due to currently requiring RTL simulation of a model to determine the FIFO depths by tracking the behavior of the model.
Currently, FINN uses two main approaches for FIFO sizing:
The global sizing algorithm which incrementally tightens FIFO sizes while rerunning RTLSIM until a steady-state is reached of the entire model. (AutoFIFOSizingMethod.LARGEFIFO_RTLSIM)
The characteristic-function based approach which RTL simulates individual nodes and constructs characteristic functions for each one and then uses phase-shifting to determine an optimal FIFO size by finding how many cycles the input characteristic function must be shifted forward before it reaches a steady state with the output stream. (AutoFIFOSizingMethod.CHARACTERIZE)
Between these two options, the later - characteristic sizing can be dramatically sped up by computing the characteristic functions of individual nodes analytically, rather than by using RTLSIM.
This can be accomplished by manually reviewing the RTL (or HLS) code of each node and constructing a model python function which reproduces the loop behavior of only the AXI stream reads and writes, filling up the characteristic function array.
The PR includes these python-based model functions as part of the custom_ops classes of the following nodes:
Additionally, it includes modifications to /builder/build_dataflow_config.py and builder/build_dataflow_steps.py so that vivado is not called in the FIFO sizing step unless there is no characteristic function for an a node (and in that case it called to only characterize that respective node). This is achieved by introducing a new 'ipgen_ignore' node argument, which is set to true for all analytically characterized nodes once FIFO sizing is started and will force the FINN compiler to skip calling Vivado. This argument set back to false, allowing to call Vivado once the analytic FIFO sizing is finished.
Improvements to be made:
The remaining nodes in FINN should be characterized as necessary
There might exist parameter configurations for the convolutioninputgenerator and streamingmaxpool nodes where the characteristic function is inaccurate since an exhaustive search is complicated to do automatically.
Currently, the test for fifo sizing tests for exact correctness of the analytical function relative to the RTLSIM output. However, small latencies introduced at the start or end of the characteristic function by HLS do not lead to a change in final FIFO sizes. The test should be changed to compare the characteristic functions in a more relaxed manner. This would then allow to also perform an exhaustive test of all possible configurations for the nodes.
The characteristic FIFO sizing algorithm lacks support for branching networks. This is irrespective of the individual characteristic functions and should be improved in the main FIFO sizing function (transformation/derive_characteristic.py)