Merge pull request openhwgroup#956 from pascalgouedo/dev_dd_pgo_doc

Some User Manual updates.
YoannPruvost · Mar 12, 2024 · 1ad59cb · 1ad59cb
2 parents e0772a0 + 3a3b4c4
commit 1ad59cb
Show file tree

Hide file tree

Showing 4 changed files with 23 additions and 11 deletions.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -98,6 +98,9 @@
 # The name of the Pygments (syntax highlighting) style to use.
 pygments_style = None
 
+# Tags for conditional text
+#tags.add('USER')
+#tags.add('PMP')
 
 # -- Options for HTML output -------------------------------------------------
 

diff --git a/docs/source/corev_hw_loop.rst b/docs/source/corev_hw_loop.rst
@@ -63,9 +63,12 @@ The HWLoop constraints are:
 
 -  HWLoop body must contain at least 3 instructions.
 
--  When both loops are nested, the End address of the outermost HWLoop (must be #1) must be at least 2
-   instructions further than the End address of the innermost HWLoop (must be #0),
-   i.e. HWLoop[1].endaddress >= HWLoop[0].endaddress + 8.
+-  When both loops are nested, at least 1 instruction should be present between last innermost HWLoop (must be #0) instruction and
+   last outermost HWLoop (must be #1) instruction. In other words the End address of the outermost HWLoop must be at least 8
+   bytes further than the End address of the innermost HWLoop (HWLoop[1].endaddress >= HWLoop[0].endaddress + 8).
+
+   In the example below the first "addi %[j], %[j], 2;" instruction is the one added due to this constraint.
+   The code could have been simpler by using only one "addi %[j], %[j], 4;" instruction but to respect this constraint it has been split in two instructions.
 
 -  HWLoop must always be entered from its start location (no branch/jump to a location inside a HWLoop body).
 

diff --git a/docs/source/instruction_set_extensions.rst b/docs/source/instruction_set_extensions.rst
@@ -789,15 +789,15 @@ General ALU operations
   |                                           |                                                                        |
   |                                           | else rD = rs1                                                          |
   |                                           |                                                                        |
-  |                                           | Note: rs2 is unsigned.                                                 |
+  |                                           | Note: rs2 is unsigned and must be in the range (0x0-0x7FFFFFFF).       |
   +-------------------------------------------+------------------------------------------------------------------------+
   | **cv.clipur rD, rs1, rs2**                | if rs1 <= 0, rD = 0,                                                   |
   |                                           |                                                                        |
   |                                           | else if rs1 >= rs2, rD = rs2,                                          |
   |                                           |                                                                        |
   |                                           | else rD = rs1                                                          |
   |                                           |                                                                        |
-  |                                           | Note: rs2 is unsigned.                                                 |
+  |                                           | Note: rs2 is unsigned and must be in the range (0x0-0x7FFFFFFF).       |
   +-------------------------------------------+------------------------------------------------------------------------+
   | **cv.addN rD, rs1, rs2, Is3**             | rD = (rs1 + rs2) >>> Is3                                               |
   |                                           |                                                                        |

diff --git a/docs/source/integration.rst b/docs/source/integration.rst
@@ -259,21 +259,27 @@ be provided.
 FPGA Synthesis
 ^^^^^^^^^^^^^^^
 
-FPGA synthesis is only supported for CV32E40P.
-The user needs to provide a technology specific implementation of a clock gating cell as described
-in :ref:`clock-gating-cell`.
+FPGA synthesis is supported for CV32E40P and it has been successfully implemented using both AMD® Vivado® and Intel® Quartus® Prime Pro Edition tools.
+
+Due to some advanced System Verilog features used by CV32E40P RTL design, Intel® Quartus® Prime Standard Edition isn't able to parse some CV32E40P System Verilog files.
+
+The user needs to provide a technology specific implementation of a clock gating cell as described in :ref:`clock-gating-cell`.
 
 .. _synthesis_with_fpu:
 
 Synthesizing with the FPU
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 
-By default the pipeline of the FPU is purely combinatorial (FPU_*_LAT = 0). In this case FPU instructions latency is the same than simple ALU operations (except FP multicycle DIV/SQRT ones).
+By default the pipeline of the FPU is purely combinatorial (FPU_*_LAT = 0). In this case FPU instructions latency is the same than simple ALU operations (except multicycle FDIV/FSQRT ones).
 But as FPU operations are much more complex than ALU ones, maximum achievable frequency is much lower than ALU one when FPU is enabled.
+
 If this can be fine for low frequency systems, it is possible to indicate how many pipeline registers are instantiated in the FPU to reach higher target frequency.
-This is done with FPU_*_LAT CV32E40P parameters setting to perfectly fit target frequency.
+This is done by adjusting FPU_*_LAT CV32E40P parameters setting to perfectly fit target frequency.
+
 It should be noted that any additional pipeline register is impacting FPU instructions latency and could cause performances degradation depending of applications using Floating-Point operations.
+
 Those pipeline registers are all added at the end of the FPU pipeline with all operators before them. Optimal frequency is only achievable using automatic retiming commands in implementation tools.
-This can be achieved with the following command for Synopsys Design Compiler:
+As an exemple, this can be done for Synopsys® Design Compiler with the following command:
+
 “set_optimize_registers true -designs [get_object_name [get_designs "\*cv32e40p_fp_wrapper\*"]]”.