diff --git a/docs/source/conf.py b/docs/source/conf.py index bb18fa6a2..53badd35b 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -98,6 +98,9 @@ # The name of the Pygments (syntax highlighting) style to use. pygments_style = None +# Tags for conditional text +#tags.add('USER') +#tags.add('PMP') # -- Options for HTML output ------------------------------------------------- diff --git a/docs/source/corev_hw_loop.rst b/docs/source/corev_hw_loop.rst index e51b0fe8f..2156ec410 100644 --- a/docs/source/corev_hw_loop.rst +++ b/docs/source/corev_hw_loop.rst @@ -63,9 +63,12 @@ The HWLoop constraints are: - HWLoop body must contain at least 3 instructions. -- When both loops are nested, the End address of the outermost HWLoop (must be #1) must be at least 2 - instructions further than the End address of the innermost HWLoop (must be #0), - i.e. HWLoop[1].endaddress >= HWLoop[0].endaddress + 8. +- When both loops are nested, at least 1 instruction should be present between last innermost HWLoop (must be #0) instruction and + last outermost HWLoop (must be #1) instruction. In other words the End address of the outermost HWLoop must be at least 8 + bytes further than the End address of the innermost HWLoop (HWLoop[1].endaddress >= HWLoop[0].endaddress + 8). + + In the example below the first "addi %[j], %[j], 2;" instruction is the one added due to this constraint. + The code could have been simpler by using only one "addi %[j], %[j], 4;" instruction but to respect this constraint it has been split in two instructions. - HWLoop must always be entered from its start location (no branch/jump to a location inside a HWLoop body). diff --git a/docs/source/instruction_set_extensions.rst b/docs/source/instruction_set_extensions.rst index de3dc3ea8..625f48247 100644 --- a/docs/source/instruction_set_extensions.rst +++ b/docs/source/instruction_set_extensions.rst @@ -789,7 +789,7 @@ General ALU operations | | | | | else rD = rs1 | | | | - | | Note: rs2 is unsigned. | + | | Note: rs2 is unsigned and must be in the range (0x0-0x7FFFFFFF). | +-------------------------------------------+------------------------------------------------------------------------+ | **cv.clipur rD, rs1, rs2** | if rs1 <= 0, rD = 0, | | | | @@ -797,7 +797,7 @@ General ALU operations | | | | | else rD = rs1 | | | | - | | Note: rs2 is unsigned. | + | | Note: rs2 is unsigned and must be in the range (0x0-0x7FFFFFFF). | +-------------------------------------------+------------------------------------------------------------------------+ | **cv.addN rD, rs1, rs2, Is3** | rD = (rs1 + rs2) >>> Is3 | | | | diff --git a/docs/source/integration.rst b/docs/source/integration.rst index d2fa3eac8..f355d2b1d 100644 --- a/docs/source/integration.rst +++ b/docs/source/integration.rst @@ -259,21 +259,27 @@ be provided. FPGA Synthesis ^^^^^^^^^^^^^^^ -FPGA synthesis is only supported for CV32E40P. -The user needs to provide a technology specific implementation of a clock gating cell as described -in :ref:`clock-gating-cell`. +FPGA synthesis is supported for CV32E40P and it has been successfully implemented using both AMD® Vivado® and Intel® Quartus® Prime Pro Edition tools. + +Due to some advanced System Verilog features used by CV32E40P RTL design, Intel® Quartus® Prime Standard Edition isn't able to parse some CV32E40P System Verilog files. + +The user needs to provide a technology specific implementation of a clock gating cell as described in :ref:`clock-gating-cell`. .. _synthesis_with_fpu: Synthesizing with the FPU ^^^^^^^^^^^^^^^^^^^^^^^^^ -By default the pipeline of the FPU is purely combinatorial (FPU_*_LAT = 0). In this case FPU instructions latency is the same than simple ALU operations (except FP multicycle DIV/SQRT ones). +By default the pipeline of the FPU is purely combinatorial (FPU_*_LAT = 0). In this case FPU instructions latency is the same than simple ALU operations (except multicycle FDIV/FSQRT ones). But as FPU operations are much more complex than ALU ones, maximum achievable frequency is much lower than ALU one when FPU is enabled. + If this can be fine for low frequency systems, it is possible to indicate how many pipeline registers are instantiated in the FPU to reach higher target frequency. -This is done with FPU_*_LAT CV32E40P parameters setting to perfectly fit target frequency. +This is done by adjusting FPU_*_LAT CV32E40P parameters setting to perfectly fit target frequency. + It should be noted that any additional pipeline register is impacting FPU instructions latency and could cause performances degradation depending of applications using Floating-Point operations. + Those pipeline registers are all added at the end of the FPU pipeline with all operators before them. Optimal frequency is only achievable using automatic retiming commands in implementation tools. -This can be achieved with the following command for Synopsys Design Compiler: +As an exemple, this can be done for Synopsys® Design Compiler with the following command: + “set_optimize_registers true -designs [get_object_name [get_designs "\*cv32e40p_fp_wrapper\*"]]”.