From 76bcbd2af99f6ca062f04338f12f59c236f2a050 Mon Sep 17 00:00:00 2001
From: Edoardo Vacchi <evacchi@users.noreply.github.com>
Date: Mon, 12 Feb 2024 22:29:13 +0100
Subject: [PATCH] wip

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
---
 .../docs/how_the_optimizing_compiler_works.md | 202 +++++++++++-------
 1 file changed, 123 insertions(+), 79 deletions(-)

diff --git a/site/content/docs/how_the_optimizing_compiler_works.md b/site/content/docs/how_the_optimizing_compiler_works.md
index e729aa0193..f7ad669a87 100644
--- a/site/content/docs/how_the_optimizing_compiler_works.md
+++ b/site/content/docs/how_the_optimizing_compiler_works.md
@@ -15,32 +15,29 @@ Compare this to the **old compiler**, where compilation happens in one step or t
 
 
 ```goat
-
-                   +-------------------+      +-------------------+
-     Input         |                   |      |                   |
-  Wasm Binary  --->|   DecodeModule    |----->|   CompileModule   |---->  wazero IR
-                   |                   |      |                   |
-                   +-------------------+      +-------------------+
+            Input         +---------------+     +---------------+
+         Wasm Binary ---->| DecodeModule  |---->| CompileModule |----> wazero IR
+                          +---------------+     +---------------+
 ```
 
 That is, the module is (1) validated then (2) translated to an Intermediate Representation (IR).
 The wazero IR can then be executed directly (in the case of the interpreter) or it can be further processed and translated into native code by the compiler. This compiler performs a straightforward translation from the IR to native code, without any further passes. The wazero IR is not intended for further processing beyond immediate execution or straightforward translation.
 
 ```goat
-                +----   wazero IR    ----+
-                |                        |
-                v                        v
-        +--------------+         +--------------+
-        |   Compiler   |         | Interpreter  |- - -  executable
-        +--------------+         +--------------+
-                |
-     +----------+---------+
-     |                    |
-     v                    v
-+---------+          +---------+
-|  ARM64  |          |  AMD64  |
-| Backend |          | Backend |    - - - - - - - - -   executable
-+---------+          +---------+
+                        +----   wazero IR    ----+
+                        |                        |
+                        v                        v
+                +--------------+         +--------------+
+                |   Compiler   |         | Interpreter  |- - -  executable
+                +--------------+         +--------------+
+                        |
+             +----------+---------+
+             |                    |
+             v                    v
+        +---------+          +---------+
+        |  ARM64  |          |  AMD64  |
+        | Backend |          | Backend |    - - - - - - - - -   executable
+        +---------+          +---------+
 ```
 
 
@@ -62,24 +59,23 @@ The wazero optimizing compiler implements the following compilation passes:
   - Finalization and Encoding
 
 ```goat
-
-       Input          +-------------------+      +-------------------+
-    Wasm Binary   --->|   DecodeModule    |----->|   CompileModule   |--+
-                      +-------------------+      +-------------------+  |
-   +--------------------------------------------------------------------+
-   |
-   |  +---------------+                                +---------------+
-   +->|   Front-End   |------------------------------->|   Back-End    |
-      +---------------+                                +---------------+
-              |                                                |
-              v                                                v
-             SSA                                     Instruction Selection
-              |                                                |
-              v                                                v
-        Optimization                                  Registry Allocation
-                                                               |
-                                                               v
-                                                     Finalization/Encoding
+              Input          +-------------------+      +-------------------+
+           Wasm Binary   --->|   DecodeModule    |----->|   CompileModule   |--+
+                             +-------------------+      +-------------------+  |
+                    +----------------------------------------------------------+
+                    |
+                    |  +---------------+            +---------------+
+                    +->|   Front-End   |----------->|   Back-End    |
+                       +---------------+            +---------------+
+                               |                            |
+                               v                            v
+                              SSA                 Instruction Selection
+                               |                            |
+                               v                            v
+                         Optimization              Registry Allocation
+                                                            |
+                                                            v
+                                                  Finalization/Encoding
 ```
 
 ## Front-End: Translation to SSA
@@ -110,64 +106,112 @@ For instance, take the following implementation of the `abs` function:
 This is translated to the following block diagram:
 
 ```goat
-       +---------------------------------------------+
-       |blk0: (exec_ctx:i64, module_ctx:i64, v2:i32) |
-       |    v3:i32 = Iconst_32 0x0                   |
-       |    v4:i32 = Icmp lt_s, v2, v3               |
-       |    Brz v4, blk2                             |
-       |    Jump blk1                                |
-       +---------------------------------------------+
-                              |
-                              |
-              +---(v4 != 0)---+--(v4 == 0)----+
-              |                               |
-              v                               v
-+---------------------------+   +---------------------------+
-|blk1: () <-- (blk0)        |   |blk2: () <-- (blk0)        |
-|    v6:i32 = Iconst_32 0x0 |   |    Jump blk3, v2          |
-|    v7:i32 = Isub v6, v2   |   |                           |
-|    Jump blk3, v7          |   |                           |
-+---------------------------+   +---------------------------+
-              |                               |
-              |                               |
-              +-{v5 := v7}----+---{v5 := v2}--+
-                              |
-                              v
-              +------------------------------+
-              |blk3: (v5:i32) <-- (blk1,blk2)|
-              |    Jump blk_ret, v5          |
-              +------------------------------+
-                              |
-                         {return v5}
-                              |
-                              v
+               +---------------------------------------------+
+               |blk0: (exec_ctx:i64, module_ctx:i64, v2:i32) |
+               |    v3:i32 = Iconst_32 0x0                   |
+               |    v4:i32 = Icmp lt_s, v2, v3               |
+               |    Brz v4, blk2                             |
+               |    Jump blk1                                |
+               +---------------------------------------------+
+                                      |
+                                      |
+                      +---(v4 != 0)---+--(v4 == 0)----+
+                      |                               |
+                      v                               v
+        +---------------------------+   +---------------------------+
+        |blk1: () <-- (blk0)        |   |blk2: () <-- (blk0)        |
+        |    v6:i32 = Iconst_32 0x0 |   |    Jump blk3, v2          |
+        |    v7:i32 = Isub v6, v2   |   |                           |
+        |    Jump blk3, v7          |   |                           |
+        +---------------------------+   +---------------------------+
+                      |                               |
+                      |                               |
+                      +-{v5 := v7}----+---{v5 := v2}--+
+                                      |
+                                      v
+                      +------------------------------+
+                      |blk3: (v5:i32) <-- (blk1,blk2)|
+                      |    Jump blk_ret, v5          |
+                      +------------------------------+
+                                      |
+                                 {return v5}
+                                      |
+                                      v
 ```
 
 We use the ["block argument" variant of SSA][ssa-blocks], which is also the same representation [used in LLVM's MLIR][llvm-mlir]. In this variant, each block takes a list of arguments. Each block ends with a jump instruction with an optional list of arguments; these arguments, are assigned to the target block's arguments like a function.
 
-Consider the first block `blk0`. You will notice that, compared to the original function, it takes two extra parameters (`exec_ctx` and `module_ctx`). It then takes one parameter `v2`, corresponding to the function parameter, and it defines two variables `v3`, `v4`. `v3` is the constant 0, `v4` is the result of comparing `v2` to `v3` using the `i32.lt_s` instruction.
+Consider the first block `blk0`.
+
+```
+blk0: (exec_ctx:i64, module_ctx:i64, v2:i32)
+    v3:i32 = Iconst_32 0x0
+    v4:i32 = Icmp lt_s, v2, v3
+    Brz v4, blk2
+    Jump blk1
+```
+
+You will notice that, compared to the original function, it takes two extra parameters (`exec_ctx` and `module_ctx`). It then takes one parameter `v2`, corresponding to the function parameter, and it defines two variables `v3`, `v4`. `v3` is the constant 0, `v4` is the result of comparing `v2` to `v3` using the `i32.lt_s` instruction. Then, it branches to `blk2` if `v4` is zero, otherwise it jumps to `blk1`.
 
-You might also have noticed that the instructions do not correspond strictly to  the original Wasm opcodes. This is because, similarly to the wazero IR used by the old compiler, this is a custom IR.
+You might also have noticed that the instructions do not correspond strictly to  the original Wasm opcodes. This is because, similarly to the wazero IR used by the old compiler, this is a custom IR. You will also notice that, _on the right-hand side of the assignments_ of any statement, no name occurs _twice_: this is why this form is called **single-assignment**.
 
-You will also notice that, on the right-hand side of the assignments of any block, no name occurs twice: this is why this form is called "single-assignment".
+Finally, notice how `blk1` and `blk2` end with a jump to the last block `blk3`.
+
+```
+blk1: ()
+    ...
+	Jump blk3, v7
+
+blk2: ()
+	Jump blk3, v2
+
+blk3: (v5:i32)
+    ...
+```
 
+`blk3` takes an argument `v5`: `blk1` jumps to `bl3` with `v7` and `blk2` jumps to `blk3` with `v2`, meaning `v5` is effectively a rename of `v5` or `v7`, depending on the originating block. If you are familiar with the traditional representation of an SSA form, you will recognize that the role of block arguments is equivalent to the role of the *Phi (Φ) function*, a special function that returns a different value depending on the incoming edge; e.g., in this case: `v5 := Φ(v7, v2)`.
 
 
+## Front-End: Optimization
 
+The SSA form makes it easier to perform a number of optimizations. For instance, we can perform constant propagation, dead code elimination, and common subexpression elimination. These optimizations either act upon the instructions within a basic block, or they act upon the control-flow graph as a whole.
 
+On a high, level, consider the following basic block, derived from the previous example:
 
+```
+blk0: (exec_ctx:i64, module_ctx:i64)
+    v2:i32 = Iconst_32 -5
+    v3:i32 = Iconst_32  0
+    v4:i32 = Icmp lt_s, v2, v3
+    Brz v4, blk2
+    Jump blk1
+```
+
+It is pretty easy to see that the comparison in `v4` can be replaced by a constant `1`, because the comparison is between two constant values (-5, 0). Therefore, the block can be rewritten as such:
+
+```
+blk0: (exec_ctx:i64, module_ctx:i64)
+    v4:i32 = Iconst_32 1
+    Brz v4, blk2
+    Jump blk1
+```
+
+However, we can now also see that the branch is always taken, and that the block `blk2` is never executed, so even the branch instruction and the constant definition `v4` can be removed:
+
+```
+blk0: (exec_ctx:i64, module_ctx:i64)
+    Jump blk1
+```
 
+This is a simple example of constant propagation and dead code elimination occurring within a basic block. However, now  `blk2` is unreachable, because there is no other edge in the edge that points to it; thus it can be removed from the control-flow graph. This is an example of dead-code elimination that occurs at the control-flow graph level.
 
-<!--
-which is equivalent to the traditional PHI function based one, but more convenient during optimizations.
-However, in this package's source code comment, we might use PHI whenever it seems necessary in order to be aligned with
-existing literatures, e.g. SSA level optimization algorithms are often described using PHI nodes.
+In practice, because WebAssembly is a compilation target, these simple optimizations are often unnecessary. The optimization passes implemented in wazero are also work-in-progress and, at the time of writing, further work is expected to implement more advanced optimizations.
 
+<!-- say more about block layout etc... -->
 
+## Back-End
 
-The algorithm to resolve variable definitions used here is based on the paper
-"Simple and Efficient Construction of Static Single Assignment Form": https://link.springer.com/content/pdf/10.1007/978-3-642-37051-9_6.pdf.
--->
+...
 
 [ssa-blocks]: https://en.wikipedia.org/wiki/Static_single-assignment_form#Block_arguments
 [llvm-mlir]: https://mlir.llvm.org/docs/Rationale/Rationale/#block-arguments-vs-phi-nodes