Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for packed MV(A)Us #794

Closed
wants to merge 112 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
be1503a
First changes to custom_op for RTL-based MVAU
mmrahorovic Jan 3, 2023
8265985
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Apr 5, 2023
afab9cd
[rtl custom op]: initial implementation of mvu_8sx9
mmrahorovic Apr 6, 2023
a94fc3b
[rtl custom op]: testbench for mvu_8sx9
mmrahorovic Apr 6, 2023
98f9acc
[rtl custom op]: initial implementation of flow control component for…
mmrahorovic Apr 6, 2023
96925a9
[rtl custom op]: implementation of replay buffer for mvu
mmrahorovic Apr 6, 2023
a3d1156
[rtl custom op]: testbench for mvu_8sx9_axi (including axi_wrapper & …
mmrahorovic Apr 6, 2023
2aea664
[rtl custom op]: initial implementation of verilog wrapper for mvu_8s…
mmrahorovic Apr 6, 2023
c92e4e3
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Apr 6, 2023
8b57849
[rtl mvu]: fix tab indentation
mmrahorovic Apr 11, 2023
5e61f42
[rtl custom op]: fix to indentation
mmrahorovic Apr 12, 2023
cbee193
[rtl custom-op]: minor changes for compiler integration
mmrahorovic Apr 12, 2023
ba5e77b
[rtl custom op]: moved testbenches to separate directory
mmrahorovic Apr 12, 2023
69310b4
[rtl custom op]: fixed output width to ACCU_WIDTH
mmrahorovic Apr 12, 2023
cfcff00
[rtl custom op]: renamed file and added generic to switch between com…
mmrahorovic Apr 12, 2023
72b5196
[rtl custom op]: renamed file and added generic to switch between com…
mmrahorovic Apr 12, 2023
c068bb6
[rtl mvu]: added behavioral model DSP58
mmrahorovic May 8, 2023
18f94e7
[rtl mvu]: extended flow control wrapper with additional compute core…
mmrahorovic May 8, 2023
6d4a0a7
[rtl mvu]: fix to done_len flag when SIMD dimension fully unrolled an…
mmrahorovic May 8, 2023
90c547d
[rtl mvu tb]: updated testbench
mmrahorovic May 8, 2023
0c37f1f
[builder]: added specialize_to_rtl step and changed standalone thresh…
mmrahorovic May 8, 2023
5ccb016
[builder]: added specialize_to_rtl step
mmrahorovic May 8, 2023
f099f4b
[custom op]: added custom op MatrixVectorActivation_rtl
mmrahorovic May 8, 2023
9a3b0fd
[custom op]: added additional attribute to enable conversion to RTL (…
mmrahorovic May 8, 2023
38aa930
[custom op]: modified ip-stitching and code generation
mmrahorovic May 8, 2023
4e44934
[tests]: initial version of unit test for RTL custom op and specializ…
mmrahorovic May 8, 2023
cc361d9
[rtl mvu]: specialized compute core for 4-bit weights and activations…
mmrahorovic May 8, 2023
8eefb53
[rtl mvu]: specialized compute core for > 4-bit weights and activatio…
mmrahorovic May 8, 2023
e7109e7
[fpgadataflow transform]: initial specialize_to_rtl_layers-transform …
mmrahorovic May 8, 2023
d107b4d
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic May 9, 2023
5a868d1
[rtl mvu] fixes for latest memstream + linting
maltanar May 9, 2023
4a9cfa1
[rtl custom_op]: add support for external weights
mmrahorovic May 11, 2023
8a9ac1a
Specify clock and reset associations of bus interfaces.
preusser May 11, 2023
51bbe02
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic May 21, 2023
3d856b7
Merge branch 'dev' into feature/dsp_packing
preusser May 23, 2023
d9b9079
[rtlmvu] More fixes for memstream and param gen
maltanar May 15, 2023
a5f2a83
[Build] apply config to only FIFO nodes in step_set_fifo_depths
maltanar May 11, 2023
08cbdc5
Revised control interface attributes.
preusser May 24, 2023
48f0c5c
Merge branch 'dev' into feature/dsp_packing
preusser May 24, 2023
d058cc2
Mask device primitives from Verilator in favor of using behavioral code.
preusser May 24, 2023
a66f38f
[Deps] update qonnx
maltanar May 11, 2023
8f9bd04
Adding folding hints. Impl selection by case statement.
preusser May 24, 2023
8799707
Merge branch 'feature/verilator_workarounds' into feature/dsp_packing
preusser May 24, 2023
9de5ed6
Fixed behavioral sideband prediction.
preusser May 24, 2023
b6e92bb
Merge remote-tracking branch 'origin/feature/dsp_packing' into featur…
mmrahorovic May 24, 2023
239759a
[rtl mvu]: extension to allow selecting PE values that are not multip…
mmrahorovic May 24, 2023
8d3247c
[rtlmvu] Avoid unintentional verilator metacomments
maltanar May 24, 2023
ffc11d6
Merge remote-tracking branch 'origin/feature/dsp_packing' into featur…
mmrahorovic May 24, 2023
c866350
[rtl mvu]: extension to allow selecting PE values that are not multip…
mmrahorovic May 24, 2023
fd1e038
[rtl mvu axi]: updated comments on folding hints
mmrahorovic May 24, 2023
f60d4c6
[rtl custom op]: minor fixes to codegen
mmrahorovic Jun 2, 2023
a1ad304
[specialize-to-rtl]: add ram_style and rt_writeable_weights support
mmrahorovic Jun 2, 2023
2cbb68f
[rtllib]: change string type to parameter type due to Vivado error
mmrahorovic Jun 2, 2023
92eb0ed
[rtllib]: renamed variable for consistency
mmrahorovic Jun 2, 2023
471a221
Fix improper blocking assignment & linting.
preusser Jun 2, 2023
5c5dc09
[test rtl mvu]: modified/extended test cases
mmrahorovic Jun 2, 2023
b4eb9b6
[rtl mvu]: updated DSP58 >4-bit variant to lift SIMD%3==0 restriction
mmrahorovic Jun 30, 2023
ad63673
[rtl mvu]: bug fix for SIMD=1 init_leave_loads
mmrahorovic Jun 30, 2023
79e8a5e
[mvu rtl]: restrict index i to be less than 3 (within bounds of hi4)
mmrahorovic Jul 13, 2023
7be62b4
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Jul 17, 2023
e3493c3
Rewrite replay_buffer for input elasticity.
preusser Jun 2, 2023
44fae0c
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Jul 31, 2023
df51f11
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Aug 16, 2023
2efba68
[to-rtl]: Infer unique node names after transformation is applied
mmrahorovic Sep 5, 2023
114ea1b
[mvu rtl]: add synthesis directive to handle 'X in simulation
mmrahorovic Sep 18, 2023
79fafdb
[replay buffer rtl]: minor fix to when LEN=1 (= AWIDTH=0)
mmrahorovic Sep 18, 2023
619d9db
[mvu lut]: LUT-based MVU compute core
mmrahorovic Sep 18, 2023
090f2ac
[custom op]: add preferred_backend attribute
mmrahorovic Sep 19, 2023
ac5e82d
Ensure a minimum of two buffer slots even for length-1 sequences.
preusser Sep 21, 2023
d5ff2a2
Merge pull request #1 from Xilinx/bugfix/replay_len1
mmrahorovic Sep 21, 2023
bb94092
Merge remote-tracking branch 'origin/feature/dsp_packing' into featur…
mmrahorovic Sep 21, 2023
8515693
[rtl mvu wrapper]: support for vvu layer and rename
mmrahorovic Sep 21, 2023
cf28d78
[mvu vvu tb]: modified testbench to also support testing VVU on DSP58
mmrahorovic Sep 21, 2023
2617c39
[axi wrapper]: minor modification to comment description
mmrahorovic Sep 21, 2023
8ca5fe7
[mvu axi]: add support for VVU on DSP58
mmrahorovic Sep 21, 2023
32d6338
[mvu vvu axi]: renamed file for consistency purposes
mmrahorovic Sep 21, 2023
031406d
[mvu 8sx9]: added support for VVU on DSP58, resolved PyVerilator-caus…
mmrahorovic Sep 21, 2023
e2c1f15
[mvu vvu 8sx9]: renamed compute core for consistency
mmrahorovic Sep 21, 2023
adb5869
[axi wrapper]: changed parameter to localparam
mmrahorovic Sep 21, 2023
f54d438
[axi]: added support for LUT-based VVU
mmrahorovic Sep 21, 2023
a4e2ac7
[mvu vvu 8sx9]: minor change to list of generics
mmrahorovic Sep 21, 2023
40ad0b4
[mvu lut]: added support for VVU
mmrahorovic Sep 21, 2023
30fcb5b
[mvu vvu lut]: renamed file for consistency
mmrahorovic Sep 21, 2023
cb43438
Revert to proper address truncation without generation bit.
preusser Sep 21, 2023
b4b69f3
remove deletd/renamed files
mmrahorovic Sep 21, 2023
14c5fa9
[mvu vvu 8sx9]: renamed for consistency
mmrahorovic Sep 21, 2023
3a37588
[mvu vvu axi]: changes for renamed module
mmrahorovic Sep 21, 2023
afe36ba
[mvu vvu wrapper]: convert localparam to param
mmrahorovic Sep 25, 2023
e4f2f9e
[mvau-rtl custom-op]: bugfix to instantiate memstreamer, modified ren…
mmrahorovic Sep 25, 2023
b49b79a
[specialize to rtl]: fix to changed attribute name and added support …
mmrahorovic Sep 25, 2023
9bdba03
Adding core for DSP48 backport.
preusser Sep 19, 2023
2cf1ef7
[mvu rtl core]: added support for signed activations for DSP48-based …
mmrahorovic Sep 25, 2023
ab8d4a8
[rtl mvu custom-op]: add upper bound to SEGMENTLEN equal to number of…
mmrahorovic Sep 25, 2023
5a429fc
[mvu_vvu dsp58]: change weight input to 2D instead of 3D array
mmrahorovic Oct 13, 2023
a4a18bb
[mvu_vvu axi]: re-wire weights appropriately for VVU DSP58
mmrahorovic Oct 13, 2023
cc0737b
[mvu_vvu axi wrapper]: fix to IS_MVU parameter
mmrahorovic Oct 13, 2023
c0eff0b
[mvu_vvu tb]: WIP -- changes to self-checker and shape of input data
mmrahorovic Oct 13, 2023
cf7f494
[mvu vvu axi]: minor bugfixes to enable VVU
mmrahorovic Nov 1, 2023
5ffc221
[mvu vvu axi]: minor fix -- define mvauin_weight_t
mmrahorovic Nov 20, 2023
d573043
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Nov 27, 2023
40d652c
[rtl mvu op]: minor fix to chain length estimation and enabled behavi…
mmrahorovic Nov 29, 2023
6e98bac
[rtlsim]: use pyverilator util functions
mmrahorovic Dec 13, 2023
5dd74ad
[mvu vvu axi]: sign extend output tdata (byte-aligned)
mmrahorovic Dec 8, 2023
b20410b
[mvu core]: dsp48 convert unpacked array to packed array to work arou…
mmrahorovic Jan 8, 2024
1c2cc0c
[mvu axi]: update list of deduced parameters
mmrahorovic Jan 8, 2024
eeb3cea
[mvu custom-op]: remove lut-based implementation and update compute c…
mmrahorovic Jan 8, 2024
0813d14
[mvu axi]: remove LUT-based compute core
mmrahorovic Jan 8, 2024
4892d66
[hls custom-op]: enable reset in sim
mmrahorovic Jan 11, 2024
44f6e0f
[test mvu rtl]: updated test flow (DSP58 only)
mmrahorovic Jan 11, 2024
9b2cceb
[mvu vvu axi]: reworked flow control and backpressure handling by tpr…
mmrahorovic Jan 11, 2024
ee9f027
Adding DSP48E1 support for 8-bit compute. Todo: finer core differenti…
preusser Jan 31, 2024
3ab8296
Adding DSP48E1 support for 4-bit compute. Todo: finer core differenti…
preusser Jan 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
494 changes: 494 additions & 0 deletions finn-rtllib/mvu/mvu_4sx4u.sv

Large diffs are not rendered by default.

492 changes: 492 additions & 0 deletions finn-rtllib/mvu/mvu_8sx8u_dsp48.sv

Large diffs are not rendered by default.

430 changes: 430 additions & 0 deletions finn-rtllib/mvu/mvu_vvu_8sx9_dsp58.sv

Large diffs are not rendered by default.

234 changes: 234 additions & 0 deletions finn-rtllib/mvu/mvu_vvu_axi.sv
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
/******************************************************************************
* Copyright (C) 2022, Advanced Micro Devices, Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. Neither the name of the copyright holder nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
* OR BUSINESS INTERRUPTION). HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
* OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
* ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* @brief Matrix Vector Unit (MVU) & Vector Vector Unit (VVU) AXI-lite interface wrapper.
* @details
* The following compute cores are supported:
* - 4-bit MVU on DSP48 & DSP58 achieving 4 MACs/DSP,
* (4,8]-bit MVU on DSP48 achieving 2 MACs/DSP,
* [4,9]-bit MVU and VVU on DSP58 achieving 3 MACs/DSP,
* 'unconstrained' LUT-based MVU and VVU.
* Folding hints:
* - PE scaling should divide MH.
* - SIMD scaling should divide MW.
* - Otherwise, keep SIMD and PE somewhat balanced. SIMD scaling tends to
* impact critical paths more than PE scaling. PE scaling implies a
* bigger fanout on the input activations.
* - Full unfolding along MH (PE=MH) results in no replay buffer instantiated
*****************************************************************************/

module mvu_vvu_axi #(
bit IS_MVU,
parameter COMPUTE_CORE,
int unsigned MW,
int unsigned MH,
int unsigned PE,
int unsigned SIMD,
int unsigned ACTIVATION_WIDTH,
int unsigned WEIGHT_WIDTH,
int unsigned ACCU_WIDTH,
bit SIGNED_ACTIVATIONS = 0,
int unsigned SEGMENTLEN = 0,
bit FORCE_BEHAVIORAL = 0,
bit M_REG_LUT = 1,

// Safely deducible parameters
localparam int unsigned WEIGHT_STREAM_WIDTH = PE * SIMD * WEIGHT_WIDTH,
localparam int unsigned WEIGHT_STREAM_WIDTH_BA = (WEIGHT_STREAM_WIDTH + 7) / 8 * 8,
localparam int unsigned INPUT_STREAM_WIDTH = SIMD * ACTIVATION_WIDTH,
localparam int unsigned INPUT_STREAM_WIDTH_BA = (INPUT_STREAM_WIDTH + 7) / 8 * 8,
localparam int unsigned OUTPUT_STREAM_WIDTH = PE * ACCU_WIDTH,
localparam int unsigned OUTPUT_STREAM_WIDTH_BA = (OUTPUT_STREAM_WIDTH + 7) / 8 * 8,
localparam int unsigned SF = MW / SIMD,
localparam int unsigned NF = MH / PE
)
(
// Global Control
input logic ap_clk,
input logic ap_rst_n,

// Weight Stream
input logic [WEIGHT_STREAM_WIDTH_BA-1:0] s_axis_weights_tdata,
input logic s_axis_weights_tvalid,
output logic s_axis_weights_tready,

// Input Stream
input logic [INPUT_STREAM_WIDTH_BA-1:0] s_axis_input_tdata,
input logic s_axis_input_tvalid,
output logic s_axis_input_tready,

// Output Stream
output logic [OUTPUT_STREAM_WIDTH_BA-1:0] m_axis_output_tdata,
output logic m_axis_output_tvalid,
input logic m_axis_output_tready
);

//-------------------- Parameter sanity checks --------------------\\
initial begin
if (MW % SIMD != 0) begin
$error("Matrix width (%0d) is not a multiple of SIMD (%0d).", MW, SIMD);
$finish;
end
if (MH % PE != 0) begin
$error("Matrix height (%0d) is not a multiple of PE (%0d).", MH, PE);
$finish;
end
if (WEIGHT_WIDTH > 8) begin
$error("Weight width of %0d-bits exceeds maximum of 8-bits", WEIGHT_WIDTH);
$finish;
end
if (ACTIVATION_WIDTH > 8) begin
if (!(SIGNED_ACTIVATIONS == 1 && ACTIVATION_WIDTH == 9 && COMPUTE_CORE == "mvu_vvu_8sx9_dsp58")) begin
$error("Activation width of %0d-bits exceeds maximum of 9-bits for signed numbers on DSP48", ACTIVATION_WIDTH);
$finish;
end
end
if (COMPUTE_CORE == "mvu_vvu_8sx9_dsp58") begin
if (SEGMENTLEN == 0) begin
$warning("Segment length of %0d defaults to chain length of %0d", SEGMENTLEN, (SIMD+2)/3);
end
if (SEGMENTLEN > (SIMD+2)/3) begin
$error("Segment length of %0d exceeds chain length of %0d", SEGMENTLEN, (SIMD+2)/3);
$finish;
end
end
end

uwire clk = ap_clk;
uwire rst = !ap_rst_n;

//- Replay to Accommodate Neuron Fold -----------------------------------
typedef logic [PE*SIMD-1:0][ACTIVATION_WIDTH-1:0] mvu_flatin_t;
uwire mvu_flatin_t amvau;
uwire alast;
uwire afin;
uwire avld;
uwire ardy;

replay_buffer #(.LEN(SF), .REP(NF), .W($bits(mvu_flatin_t))) activation_replay (
.clk, .rst,
.ivld(s_axis_input_tvalid), .irdy(s_axis_input_tready), .idat(mvu_flatin_t'(s_axis_input_tdata)),
.ovld(avld), .ordy(ardy), .odat(amvau), .olast(alast), .ofin(afin)
);

//- Unflatten inputs into structured matrices ---------------------------
typedef logic [PE-1:0][SIMD-1:0][WEIGHT_WIDTH -1:0] mvu_w_t;
typedef logic [SIMD-1:0][ACTIVATION_WIDTH-1:0] mvu_a_t;

uwire mvu_w_t mvu_w = s_axis_weights_tdata;
uwire mvu_a_t mvu_a = amvau;

//- Flow Control Bracket around Compute Core ----------------------------
uwire en;
uwire istb = avld && s_axis_weights_tvalid;
assign ardy = en && s_axis_weights_tvalid;
assign s_axis_weights_tready = en && avld;

//- Instantiate compute core ----------------------------
typedef logic [PE-1:0][ACCU_WIDTH-1:0] dsp_p_t;
uwire dsp_vld;
uwire dsp_p_t dsp_p;

uwire dsp_clk = ap_clk;
uwire dsp_en = en;
uwire dsp_last = alast && avld;
uwire dsp_zero = !istb;
uwire mvu_w_t dsp_w = mvu_w;
uwire mvu_a_t dsp_a = mvu_a;
uwire ovld = dsp_vld;
uwire dsp_p_t odat = dsp_p;

case(COMPUTE_CORE)
"mvu_vvu_8sx9_dsp58":
mvu_vvu_8sx9_dsp58 #(.IS_MVU(IS_MVU), .PE(PE), .SIMD(SIMD), .ACTIVATION_WIDTH(ACTIVATION_WIDTH), .WEIGHT_WIDTH(WEIGHT_WIDTH),
.ACCU_WIDTH(ACCU_WIDTH), .SIGNED_ACTIVATIONS(SIGNED_ACTIVATIONS), .SEGMENTLEN(SEGMENTLEN),
.FORCE_BEHAVIORAL(FORCE_BEHAVIORAL)) core (
.clk(dsp_clk), .rst, .en(dsp_en),
.last(dsp_last), .zero(dsp_zero), .w(dsp_w), .a(dsp_a),
.vld(dsp_vld), .p(dsp_p)
);
"mvu_4sx4u":
mvu_4sx4u #(.PE(PE), .SIMD(SIMD), .ACCU_WIDTH(ACCU_WIDTH), .SIGNED_ACTIVATIONS(SIGNED_ACTIVATIONS), .FORCE_BEHAVIORAL(FORCE_BEHAVIORAL)) core (
.clk(dsp_clk), .rst, .en(dsp_en),
.last(dsp_last), .zero(dsp_zero), .w(dsp_w), .a(dsp_a),
.vld(dsp_vld), .p(dsp_p)
);
"mvu_8sx8u_dsp48":
mvu_8sx8u_dsp48 #(.PE(PE), .SIMD(SIMD), .ACCU_WIDTH(ACCU_WIDTH), .ACTIVATION_WIDTH(ACTIVATION_WIDTH), .WEIGHT_WIDTH(WEIGHT_WIDTH),
.SIGNED_ACTIVATIONS(SIGNED_ACTIVATIONS), .FORCE_BEHAVIORAL(FORCE_BEHAVIORAL)) core (
.clk(dsp_clk), .rst, .en(dsp_en),
.last(dsp_last), .zero(dsp_zero), .w(dsp_w), .a(dsp_a),
.vld(dsp_vld), .p(dsp_p)
);
default: initial begin
$error("Unrecognized COMPUTE_CORE '%s'", COMPUTE_CORE);
$finish;
end
endcase

//-------------------- Output register slice --------------------\\
// Make `en`computation independent from external inputs.
// Drive all outputs from registers.
struct packed {
logic rdy;
logic [PE-1:0][ACCU_WIDTH-1:0] dat;
} A = '{ rdy: 1, default: 'x }; // side-step register used when encountering backpressure
struct packed {
logic vld;
logic [PE-1:0][ACCU_WIDTH-1:0] dat;
} B = '{ vld: 0, default: 'x }; // ultimate output register

assign en = A.rdy;
uwire b_load = !B.vld || m_axis_output_tready;

always_ff @(posedge clk) begin
if(rst) begin
A <= '{ rdy: 1, default: 'x };
B <= '{ vld: 0, default: 'x };
end
else begin
if(A.rdy) A.dat <= odat;
A.rdy <= (A.rdy && !ovld) || b_load;

if(b_load) begin
B <= '{
vld: ovld || !A.rdy,
dat: A.rdy? odat : A.dat
};
end
end
end
assign m_axis_output_tvalid = B.vld;
// Why would we need a sign extension here potentially creating a higher signal load into the next FIFO?
// These extra bits should never be used. Why not 'x them out?
assign m_axis_output_tdata = { {(OUTPUT_STREAM_WIDTH_BA-OUTPUT_STREAM_WIDTH){B.dat[PE-1][ACCU_WIDTH-1]}}, B.dat};


endmodule : mvu_vvu_axi
92 changes: 92 additions & 0 deletions finn-rtllib/mvu/mvu_vvu_axi_wrapper.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
/******************************************************************************
* Copyright (C) 2022, Advanced Micro Devices, Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. Neither the name of the copyright holder nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
* OR BUSINESS INTERRUPTION). HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
* OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
* ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* @brief Verilog AXI-lite wrapper for MVU & VVU.
*****************************************************************************/

module $MODULE_NAME_AXI_WRAPPER$ #(
parameter IS_MVU = $IS_MVU$,
parameter COMPUTE_CORE = "$COMPUTE_CORE$",
parameter MW = $MW$,
parameter MH = $MH$,
parameter PE = $PE$,
parameter SIMD = $SIMD$,
parameter ACTIVATION_WIDTH = $ACTIVATION_WIDTH$,
parameter WEIGHT_WIDTH = $WEIGHT_WIDTH$,
parameter ACCU_WIDTH = $ACCU_WIDTH$,
parameter SIGNED_ACTIVATIONS = $SIGNED_ACTIVATIONS$,
parameter SEGMENTLEN = $SEGMENTLEN$,
parameter FORCE_BEHAVIORAL = $FORCE_BEHAVIORAL$,

// Safely deducible parameters
parameter WEIGHT_STREAM_WIDTH_BA = (PE*SIMD*WEIGHT_WIDTH+7)/8 * 8,
parameter INPUT_STREAM_WIDTH_BA = ((IS_MVU == 1 ? 1 : PE) * SIMD * ACTIVATION_WIDTH + 7) / 8 * 8,
parameter OUTPUT_STREAM_WIDTH_BA = (PE*ACCU_WIDTH + 7)/8 * 8
)(
// Global Control
(* X_INTERFACE_PARAMETER = "ASSOCIATED_BUSIF weights_V:in0_V:out_V, ASSOCIATED_RESET ap_rst_n" *)
(* X_INTERFACE_INFO = "xilinx.com:signal:clock:1.0 ap_clk CLK" *)
input ap_clk,
(* X_INTERFACE_PARAMETER = "POLARITY ACTIVE_LOW" *)
input ap_rst_n,

// Weight Stream
input [WEIGHT_STREAM_WIDTH_BA-1:0] weights_V_TDATA,
input weights_V_TVALID,
output weights_V_TREADY,
// Input Stream
input [INPUT_STREAM_WIDTH_BA-1:0] in0_V_TDATA,
input in0_V_TVALID,
output in0_V_TREADY,
// Output Stream
output [OUTPUT_STREAM_WIDTH_BA-1:0] out_V_TDATA,
output out_V_TVALID,
input out_V_TREADY
);

mvu_vvu_axi #(
.IS_MVU(IS_MVU), .COMPUTE_CORE(COMPUTE_CORE), .MW(MW), .MH(MH), .PE(PE), .SIMD(SIMD),
.ACTIVATION_WIDTH(ACTIVATION_WIDTH), .WEIGHT_WIDTH(WEIGHT_WIDTH), .ACCU_WIDTH(ACCU_WIDTH),
.SIGNED_ACTIVATIONS(SIGNED_ACTIVATIONS), .SEGMENTLEN(SEGMENTLEN), .FORCE_BEHAVIORAL(FORCE_BEHAVIORAL)
) inst (
.ap_clk(ap_clk),
.ap_rst_n(ap_rst_n),
.s_axis_weights_tdata(weights_V_TDATA),
.s_axis_weights_tvalid(weights_V_TVALID),
.s_axis_weights_tready(weights_V_TREADY),
.s_axis_input_tdata(in0_V_TDATA),
.s_axis_input_tvalid(in0_V_TVALID),
.s_axis_input_tready(in0_V_TREADY),
.m_axis_output_tdata(out_V_TDATA),
.m_axis_output_tvalid(out_V_TVALID),
.m_axis_output_tready(out_V_TREADY)
);

endmodule // $MODULE_NAME_AXI_WRAPPER$
Loading
Loading