Project for the Code Optimization and Transformation course 2023 - integrate TAFFO and PandA-Bambu.
Perform the HLS (High Level Synthesis) of programs that have first had their floating point operations optimized in fixed point ones with the aid of TAFFO. The HLS part is instead done via PandA-Bambu.
See the following links for infomation regarding the two tools:
- TAFFO: https://github.com/TAFFO-org/TAFFO
- PandA-Bambu: https://github.com/ferrandi/PandA-bambu
Currently the repository contains various test cases revolving around floating point operations. Some of the tests originate from examples provided by either TAFFO or PandA-Bambu, as can be understood from their names.
In each folder there are always the following files:
- test.c: The source code for the test;
- test.ll: The LLVM-IR representation of test.c obtained with TAFFO, usually with the value names preserved for readability;
- test.xml: Contains the values for PandA-Bambu's simulations;
- test_generator.py: Python script used to generate test.xml's content;
- interfaces.xml: Specifies the interface of the top function that will be synthesized;
- panda_log_opt.txt: Console output of PandA-Bambu. This is specific to the code optimized with TAFFO;
- panda_log.txt: Same as panda_log_opt.txt but contains the log for the code without utilizing TAFFO;
- taffo_err_log.txt: Report generated by TAFFO about the estimates performed by the Error Propagator, currently contains the relative numerical errors;
- results_opt.txt: Report generated by PandA-Bambu after the simulation of the generated design, the first column's 1-s indicate a successful simulation, while the second column reports the number of clock cycles the simulation took to complete. Those specifically are the results for the code optimized with TAFFO;
- results.txt: Same as results_opt.txt but contains the results for the code without utilizing TAFFO;
- notes.txt: Report regarding the status of the test, containing also additional comments or procedures used to perform the test;
In some cases there are subfolders within a test's folder, those recursively mirror the same structure detailed above, and their purpose is to separate the results and HLS output for slightly different versions of the test.
Here is the current list of tests:
- ComputeConvexHull: Computes the convex hull around the given set of points -> wikipedia <-;
- ComputeFFT: Computes the Fast Furier Transform implemented via the Cooley-Tukey radix-2 Decimation in Time (DIT) FFT algorithm -> wikipedia <-;
- ComputePi: Computes Pi with the Madhava-Leibniz formula -> wikipedia <-;
-
ComputeSinCos: Computes
$sin(\pi/12)$ and$cos(\pi/6)$ via the Taylor expansion of Cosine -> wikipedia <-; - ComputeSqrt: Compute the Square root of a provided number with the Newton-Raphson method -> wikipedia <-;
- FromPanda_fft_float: Fast Furier Transfom example from here -> panda-github <-;
- FromPanda_mm_float: Matrix multiplication example from here -> panda-github <-;
- FromTaffo_axbench_fft: Fast Furier Transfom example from here -> taffo-github <-;
- FromTaffo_axbench_inversek2j: Forward and inverse kinematics example from here -> taffo-github <-;
- FromTaffo_fpbench_CX: Example from here -> taffo-github <-;
- FromTaffo_fpbench_CY: Example from here -> taffo-github <-;
- FromTaffo_fpbench_CarbonGas: Example from here -> taffo-github <-;
- FromTaffo_fpbench_doppler: Example from here -> taffo-github <-;
- FromTaffo_fpbench_instantCurrent: Example from here -> taffo-github <-;
- FromTaffo_fpbench_jetEngine: Example from here -> taffo-github <-;
- FromTaffo_fpbench_leadLag: Example from here -> taffo-github <-;
- FromTaffo_fpbench_triangle: Example from here -> taffo-github <-;
- FromTaffo_fpbench_turbine1: Example from here -> taffo-github <-;
- FromTaffo_test3: Example from here -> taffo-github <-;
- MatrixInversion: Computes the inverse of a matrix via the Gauss-Jordan method -> wikipedia <-;
- MDPPolicyIteration: Solves a simple Markov Decision Process via the Policy Iteration Method -> wikipedia <-;
- NormalizeVector: Transforms a vector to have a unitary norm;
- SimpleTaffoTest: A few trivial tests to verify that TAFFO and PandA-Bambu are working properly;
- TrainLogisticRegression: Performs the gradient descent training of a logistic regression machine learning model -> wikipedia <-;
Test variants ending with _outside_conv are made specifically to not synthesize the floating point to and from fixed point coversions and thus evaluate their impact.
Currently not all test work under the same conditions, refer to the notes.txt files for the details of each test.
A major decision to be made during the process is where to place the floating point to fixed point conversion. Since generating appropriate test values and verifying test results with the conversion not synthesized proved challenging (I investigated such possibility in the " _outside_conv" variants, which take time to be constructed), note that all the following results and those in the repository assume that the conversion is synthesized and the I/O with the design is performed with floating point values.
The latest comprehensive list of measured performance metrics on the various tests can be reviewed here:
--> link to the excel document <--
The specific versions of the tools used in this project were chosed to let both use LLVM-12, since it is the only LLVM version both fully support as of summer 2023:
- TAFFO: master branch pre-merge of spring 2023 -> taffo-master <-.
- PandA-Bambu: AppImage released in early 2023 -> panda-2023.1 <-.
- Vivado ML Edition - 2023.1: latest release of Xilinx Vivado -> official website <-, note that this tool is not required unless you plan to enable its usage for complete evaluation of the synthesized accelerators (see the "Run the tests" section).
Other versions of the tools were also occasionally used:
- TAFFO: develop branch as of summer 2023 -> taffo-develop <-. This has been used when the older version led to errors, however it uses LLVM-15, thus it is sometimes incompatible with PandA-Bambu's LLVM-12 and the older version was generally preferred. Refer to "notes.txt" to know if its usage is required, and run the produced IR through "remove_fmuladd.py" if needed.
- PandA-Bambu: Develop AppImage with loop pipelining and LLVM-16 support of summer 2023 -> panda-dev-LP-direct-download <-. This version was tested on all tests and works fine, even often producing better results, however final results remain based on the 2023.1 release for consistency and because it allows for matching LLVM versions, as this develop AppImage dropped support for LLVM-12. Consequently using this version requires an adjustment to PandA-Bambu's CLI options, where
--compiler=I386_CLANG12
is changed in--compiler=I386_CLANG16
.
The main conventions followed while developing the tests were:
- Write the program as a function, its call-graph will be entirely synthesized. Lets call it "top function".
- Within the code to be synthesized, use only arrays whose size is known at compile-time.
- Within the code to be synthesized, avoid using I/O related instructions (no printf and alike), only rely on the pointers given as parameters or return.
- Within the code to be synthesized, avoid avoid using recursion, convert it into iteration.
- From the top function, to return a scalar values you can utilize "return", otherwise pass as argument a pointer to an array where the results will be written, avoid using "malloc".
- Utilize TAFFO's annotations only within the top function and other functions it calls, do not annotate the top function's definition itself, this way the floating-point to fixed-point and viceversa conversion will be synthesized as well.
- An effective approach to write the top function is, as the first thing within it, to copy over every received floating-point argument that you intend to convert into another instance of the same data structure that is, however, annotated with TAFFO. Then continue to use this new instance withing the function.
- Specify the interface of the top function in "interfaces.xml", giving the type and, eventually, size, of every parameter. This is because when giving PandA-Bambu an “.ll” file as input, such file retains less information than its “.c” counterpart and those need to be reinstated via “interfaces.xml”. See example below.
- Write in "test.xml" the testbenches for the code. See example below.
- Optionally, write a test-main that utilizes such function, giving it realistic inputs and printing the results.
Example:
// filename: "test.c"
#define SIZE 4
void top_function(int size, float *v_float) {
// Instantiate the vector that TAFFO will convert to fixed point
float v[SIZE] __attribute((annotate("target('v') scalar()")));
// Copy over the floating point array to the one converted to fixed point
for (int i = 0; i < size && i < SIZE; i++)
v[i] = v_float[i];
// Operate on v
// Copy back the fixed point array to the floating point one
for (int i = 0; i < size && i < SIZE; i++)
v_float[i] = v[i];
}
int main() {
float v_test[SIZE] = {1, 2, 3, 4};
top_function(SIZE, v_test);
}
<!-- filename: "interfaces.xml" -->
<?xml version="1.0"?>
<module>
<function id="top_function">
<!-- Use "default" for scalars and "array" for array pointers -->
<arg id="Pd5" interface_type="default" interface_typename="int" interface_typename_include="" interface_typename_orig="int"/>
<!-- For "size" specify the maximum possible one as per the code constants -->
<arg id="Pd6" interface_type="array" interface_typename="float*" interface_typename_include="" interface_typename_orig="float*" size="4"/>
</function>
</module>
Note: Consider any pointer to a scalar value as an "array" of size 1.
<!-- filename: "test.xml" -->
<?xml version="1.0"?>
<function>
<testbench
Pd5="scalar"
Pd6="{array_value_0, array_value_1, array_value_2, array_value_3}"
size="scalar"
v_float="{array_value_0, array_value_1, array_value_2, array_value_3}"
/>
</function>
<!-- in practice replace "scalar" and "array_value_N" with actual values -->
Note: When the HLS starts from the LLVM-IR, parameter names must be in the form "PdN" with "N" realtive to the IR without preserved names, while when the HLS starts from the C source parameter names must exactly match those used in C. Thus it is useful to specify in the same testbench the values twice, once w.r.t. the IR and once w.r.t. the C source, allowing the same testbench to be used for both.
Note: For names in the form "PdN", "N" normally starts at "5" with the first parameter and increase by one with each successive one, however this might not always be the case, refer to the error PandA-Bambu gives for missing parameters to get the initial value of "N". This applies to both "interfaces.xml" and "test.xml".
The most common issues solved while developing the tests were:
- If PandA-Bambu could not find the specified top function check the name's correctness within the IR file, and if withing the IR file the function is specified as "internal" alter that to "dso_local".
- If PandA-Bambu could not find the implementation for some function(s) check that any compiler flag relative to libraries to link is passed through to PandA-Bambu. This is often the case for "math.h". If the function in question is "fmuladd" run
remove_fmuladd.py <path/to/.ll>
. If it is any other LLVM intrinsic function you will have to manually modify the IR to remove it. - If Verilator's simulation fails, utilize verbosity level 4 for PandA-Bambu (
-v 4
option) and check which outputs differ from expected ones. One likely cause might be too-large floating point values, try reducing the upper and lower bounds for values used in the testbenches below those specified in TAFFO'sscalar(range())
. Alternatively ensure the--libm-std-rounding
option is specified for PandA-Bambu, as it forces it to use the correct (but more costly) implementations of "math.h"'s functions. - A single "stoull" printed after a simulation error means that there is a scalar value written as an array (with curly braces) in the testbench values, remove the braces.
- If verilator returns a size error check that testbenches initialize all the function parameters in their entirety even if they are not fully utilized, as their size must be static at compile time and they must be entirely specified, regardless of wheather or not they are fully utilized.
- If the simulation could not find some parameter, remember that when you give PandA-Bambu the IR file, it expects parameters progressively named as PdN (where N usually starts at 5), while if it is given the C file it expectes parameter named as in it. Check the testbenches accordingly to this, use the error printed to adjust the initial value of N if needed.
- If the simulation returns "ERROR: Co-sim: Memory parameter ... (...) mismatch with respect to gold reference." you have specified the wrong types or array sizes within "interfaces.xml".
- If PandA-Bambu gives a clang frontend error the cause might be one of:
- Thee are some arrays whose size is not static at compile time within the function(s) to be synthesized.
- Currently when TAFFO converts a floating point division in fixed point it is very likely that integer types larger than "i64", like "i96" and "i128", will be used in the IR. Such values cannot be currently handled by PandA-Bambu, resulting in the clang frontend error. In the next paragraph an option to prevent their usage is given, but if the division's operands are too small, it might result in division-by-0 exceptions. As of now the only solution to such issues is disabling TAFFO's optimization on the dividend and divisor variables with
__attribute__((annotate("scalar(disabled)")))
.
Here are the main commands used to generate the LLVM-IR, run the HLS and the simulations.
- Produce the LLVM-IR optimized with TAFFO:
Add thetaffo -err-out taffo_err_log.txt -fno-discard-value-names -Xerr -relerror -S -emit-llvm -o test.ll test.c
-lm
option ifmath.h
needs to be linked.
If types likei96
ori128
are generated in the LLVM-IR, add the options-Xconversion -maxtotalbitsconv -Xconversion 64
and-Xdta -maxtotalbits -Xdta 64
. This is because currently PandA-Bambu cannot deal with such types, thus the first option prevents them from being used in intermediate types of multiplications and divisions, while the second one prevents them from being used as the integer version of originally-float values. - Run the HLS on TAFFO's produced LLVM-IR:
To specify a target device for later synthesis usebambu-2023.1.AppImage test.ll --use-raw -v 2 --top-fname=<function_name_wrt_the_IR> --compiler=I386_CLANG12 --generate-interface=INFER --interface-xml-filename=interfaces.xml --simulate --simulator=VERILATOR --verilator-parallel |& tee panda_log_opt.txt
--device-name=<name>
.
To see the input and output of each simulation use an higher log verbosity:-v 4
Add the-lm -ffast-math --libm-std-rounding
options ifmath.h
needs to be linked.
Also consider using-fsingle-precision-constant -Os --experimental-setup=BAMBU
.
Consider as target device:--device-name=xc7vx690t-3ffg1930-VVD
. - Run the HLS on the original code:
Add thebambu-2023.1.AppImage test.c -v 2 --top-fname=<function_name_wrt_the_sourcecode> --compiler=I386_CLANG12 --simulate --simulator=VERILATOR |& tee panda_log.txt
-lm -ffast-math --libm-std-rounding
options ifmath.h
needs to be linked.
Consider as target device:--device-name=xc7vx690t-3ffg1930-VVD
. - Generate new testbench values:
Different generators might require some command-line arguments.python3 test_generator.py <args> > test.xml
- Gather and print the errors, simulation results, flip-flop counts and Vivado's results summary:
python3 get_taffo_errors.py python3 get_verilator_results.py python3 get_flipflops_count.py python3 get_vivado_results.py
Before running the tests:
sudo chmod +x run_tests.sh
export BAMBU=/path/to/Bambu.AppImage {or binary}
export TAFFO=/path/to/TAFFO {only if it was not installed system-wise}
Run the tests:
./run_tests.sh --use=all
This procedure will produce two sub-folders in each test’s folder:
- synthesis_with_opt
- synthesis_no_opt
They respectively contain the output of PandA-Bambu (and, if enabled, Vivado) for the test with TAFFO’s optimization and without it.
--use=all
: runs all the tests.--use=<num>(,<num>)*
: specifies which tests to run, simply run./run_tests.sh
to get a list of the tests and their relative number.--vivado
: enables the evaluation of the design through Vivado, after PandA-Bambu.--device-name=<name>
: the same as in PandA-Bambu.
If not specified, the default device is xc7vx690t-3ffg1930-VVD, and it is the only tested one. Regardless of the device you specify, ensure your install of Vivado has support for it if you plan to also use--vivado
.
Copying from PandA-Bambu:
Specify the name of the device. Three different cases are foreseen:- Xilinx: a comma separated string specifying device, speed grade and package (e.g. "xc7z020,-1,clg484,VVD")
- Altera: a string defining the device string (e.g. EP2C70F896C6)
- Lattice: a string defining the device string (e.g. LFE5U85F8BG756C)
- NanoXplore: a string defining the device string (e.g. nx2h540tsc)
--gen
: generates new random testbench inputs instead of using the already existing ones (see test.xml files).--no-regen-taffo
: utilize the already existing, TAFFO-optimized,.ll
files, without recompiling the source through TAFFO.--no-opt
: does NOT run through PandA, and Vivado, if enabled, the TAFFO-optimized versions of the tests.--no-unopt
: does NOT run through PandA, and Vivado, if enabled, the versions of the tests NOT optimized by TAFFO.- `--opt-level=' : pass through the choice of optimization level to TAFFO and PandA-Bambu, available levels are: "O0", "O1", "O2", "O3", "Os", "Of".
The usage of --no-regen-taffo
is highly suggested, as currently some of the .ll
files were produced with TAFFO's develop branch, and others with the last version of TAFFO that used LLVM-12.
Usage of --O0
is highly suggested as well to ensure the fairness of results, as TAFFO does not necessarily optimize the IR the same way Bambu's internal LLVM version does.
Note that using both --no-opt
and --no-unopt
will just result in recompilation through TAFFO.
Example: a command that runs tests 1, 2 and 3, both with and without optimization, through PandA-Bambu and Vivado, without regenerating neither the LLVM-IR nor the values for the testbenches:
./run_tests.sh --use=1,2,3 --vivado --no-regen-taffo