Skip to content

Commit

Permalink
wip
Browse files Browse the repository at this point in the history
Signed-off-by: Edoardo Vacchi <[email protected]>
  • Loading branch information
evacchi committed Feb 12, 2024
1 parent 4964a41 commit 76bcbd2
Showing 1 changed file with 123 additions and 79 deletions.
202 changes: 123 additions & 79 deletions site/content/docs/how_the_optimizing_compiler_works.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,32 +15,29 @@ Compare this to the **old compiler**, where compilation happens in one step or t


```goat
+-------------------+ +-------------------+
Input | | | |
Wasm Binary --->| DecodeModule |----->| CompileModule |----> wazero IR
| | | |
+-------------------+ +-------------------+
Input +---------------+ +---------------+
Wasm Binary ---->| DecodeModule |---->| CompileModule |----> wazero IR
+---------------+ +---------------+
```

That is, the module is (1) validated then (2) translated to an Intermediate Representation (IR).
The wazero IR can then be executed directly (in the case of the interpreter) or it can be further processed and translated into native code by the compiler. This compiler performs a straightforward translation from the IR to native code, without any further passes. The wazero IR is not intended for further processing beyond immediate execution or straightforward translation.

```goat
+---- wazero IR ----+
| |
v v
+--------------+ +--------------+
| Compiler | | Interpreter |- - - executable
+--------------+ +--------------+
|
+----------+---------+
| |
v v
+---------+ +---------+
| ARM64 | | AMD64 |
| Backend | | Backend | - - - - - - - - - executable
+---------+ +---------+
+---- wazero IR ----+
| |
v v
+--------------+ +--------------+
| Compiler | | Interpreter |- - - executable
+--------------+ +--------------+
|
+----------+---------+
| |
v v
+---------+ +---------+
| ARM64 | | AMD64 |
| Backend | | Backend | - - - - - - - - - executable
+---------+ +---------+
```


Expand All @@ -62,24 +59,23 @@ The wazero optimizing compiler implements the following compilation passes:
- Finalization and Encoding

```goat
Input +-------------------+ +-------------------+
Wasm Binary --->| DecodeModule |----->| CompileModule |--+
+-------------------+ +-------------------+ |
+--------------------------------------------------------------------+
|
| +---------------+ +---------------+
+->| Front-End |------------------------------->| Back-End |
+---------------+ +---------------+
| |
v v
SSA Instruction Selection
| |
v v
Optimization Registry Allocation
|
v
Finalization/Encoding
Input +-------------------+ +-------------------+
Wasm Binary --->| DecodeModule |----->| CompileModule |--+
+-------------------+ +-------------------+ |
+----------------------------------------------------------+
|
| +---------------+ +---------------+
+->| Front-End |----------->| Back-End |
+---------------+ +---------------+
| |
v v
SSA Instruction Selection
| |
v v
Optimization Registry Allocation
|
v
Finalization/Encoding
```

## Front-End: Translation to SSA
Expand Down Expand Up @@ -110,64 +106,112 @@ For instance, take the following implementation of the `abs` function:
This is translated to the following block diagram:

```goat
+---------------------------------------------+
|blk0: (exec_ctx:i64, module_ctx:i64, v2:i32) |
| v3:i32 = Iconst_32 0x0 |
| v4:i32 = Icmp lt_s, v2, v3 |
| Brz v4, blk2 |
| Jump blk1 |
+---------------------------------------------+
|
|
+---(v4 != 0)---+--(v4 == 0)----+
| |
v v
+---------------------------+ +---------------------------+
|blk1: () <-- (blk0) | |blk2: () <-- (blk0) |
| v6:i32 = Iconst_32 0x0 | | Jump blk3, v2 |
| v7:i32 = Isub v6, v2 | | |
| Jump blk3, v7 | | |
+---------------------------+ +---------------------------+
| |
| |
+-{v5 := v7}----+---{v5 := v2}--+
|
v
+------------------------------+
|blk3: (v5:i32) <-- (blk1,blk2)|
| Jump blk_ret, v5 |
+------------------------------+
|
{return v5}
|
v
+---------------------------------------------+
|blk0: (exec_ctx:i64, module_ctx:i64, v2:i32) |
| v3:i32 = Iconst_32 0x0 |
| v4:i32 = Icmp lt_s, v2, v3 |
| Brz v4, blk2 |
| Jump blk1 |
+---------------------------------------------+
|
|
+---(v4 != 0)---+--(v4 == 0)----+
| |
v v
+---------------------------+ +---------------------------+
|blk1: () <-- (blk0) | |blk2: () <-- (blk0) |
| v6:i32 = Iconst_32 0x0 | | Jump blk3, v2 |
| v7:i32 = Isub v6, v2 | | |
| Jump blk3, v7 | | |
+---------------------------+ +---------------------------+
| |
| |
+-{v5 := v7}----+---{v5 := v2}--+
|
v
+------------------------------+
|blk3: (v5:i32) <-- (blk1,blk2)|
| Jump blk_ret, v5 |
+------------------------------+
|
{return v5}
|
v
```

We use the ["block argument" variant of SSA][ssa-blocks], which is also the same representation [used in LLVM's MLIR][llvm-mlir]. In this variant, each block takes a list of arguments. Each block ends with a jump instruction with an optional list of arguments; these arguments, are assigned to the target block's arguments like a function.

Consider the first block `blk0`. You will notice that, compared to the original function, it takes two extra parameters (`exec_ctx` and `module_ctx`). It then takes one parameter `v2`, corresponding to the function parameter, and it defines two variables `v3`, `v4`. `v3` is the constant 0, `v4` is the result of comparing `v2` to `v3` using the `i32.lt_s` instruction.
Consider the first block `blk0`.

```
blk0: (exec_ctx:i64, module_ctx:i64, v2:i32)
v3:i32 = Iconst_32 0x0
v4:i32 = Icmp lt_s, v2, v3
Brz v4, blk2
Jump blk1
```

You will notice that, compared to the original function, it takes two extra parameters (`exec_ctx` and `module_ctx`). It then takes one parameter `v2`, corresponding to the function parameter, and it defines two variables `v3`, `v4`. `v3` is the constant 0, `v4` is the result of comparing `v2` to `v3` using the `i32.lt_s` instruction. Then, it branches to `blk2` if `v4` is zero, otherwise it jumps to `blk1`.

You might also have noticed that the instructions do not correspond strictly to the original Wasm opcodes. This is because, similarly to the wazero IR used by the old compiler, this is a custom IR.
You might also have noticed that the instructions do not correspond strictly to the original Wasm opcodes. This is because, similarly to the wazero IR used by the old compiler, this is a custom IR. You will also notice that, _on the right-hand side of the assignments_ of any statement, no name occurs _twice_: this is why this form is called **single-assignment**.

You will also notice that, on the right-hand side of the assignments of any block, no name occurs twice: this is why this form is called "single-assignment".
Finally, notice how `blk1` and `blk2` end with a jump to the last block `blk3`.

```
blk1: ()
...
Jump blk3, v7
blk2: ()
Jump blk3, v2
blk3: (v5:i32)
...
```

`blk3` takes an argument `v5`: `blk1` jumps to `bl3` with `v7` and `blk2` jumps to `blk3` with `v2`, meaning `v5` is effectively a rename of `v5` or `v7`, depending on the originating block. If you are familiar with the traditional representation of an SSA form, you will recognize that the role of block arguments is equivalent to the role of the *Phi (Φ) function*, a special function that returns a different value depending on the incoming edge; e.g., in this case: `v5 := Φ(v7, v2)`.


## Front-End: Optimization

The SSA form makes it easier to perform a number of optimizations. For instance, we can perform constant propagation, dead code elimination, and common subexpression elimination. These optimizations either act upon the instructions within a basic block, or they act upon the control-flow graph as a whole.

On a high, level, consider the following basic block, derived from the previous example:

```
blk0: (exec_ctx:i64, module_ctx:i64)
v2:i32 = Iconst_32 -5
v3:i32 = Iconst_32 0
v4:i32 = Icmp lt_s, v2, v3
Brz v4, blk2
Jump blk1
```

It is pretty easy to see that the comparison in `v4` can be replaced by a constant `1`, because the comparison is between two constant values (-5, 0). Therefore, the block can be rewritten as such:

```
blk0: (exec_ctx:i64, module_ctx:i64)
v4:i32 = Iconst_32 1
Brz v4, blk2
Jump blk1
```

However, we can now also see that the branch is always taken, and that the block `blk2` is never executed, so even the branch instruction and the constant definition `v4` can be removed:

```
blk0: (exec_ctx:i64, module_ctx:i64)
Jump blk1
```

This is a simple example of constant propagation and dead code elimination occurring within a basic block. However, now `blk2` is unreachable, because there is no other edge in the edge that points to it; thus it can be removed from the control-flow graph. This is an example of dead-code elimination that occurs at the control-flow graph level.

<!--
which is equivalent to the traditional PHI function based one, but more convenient during optimizations.
However, in this package's source code comment, we might use PHI whenever it seems necessary in order to be aligned with
existing literatures, e.g. SSA level optimization algorithms are often described using PHI nodes.
In practice, because WebAssembly is a compilation target, these simple optimizations are often unnecessary. The optimization passes implemented in wazero are also work-in-progress and, at the time of writing, further work is expected to implement more advanced optimizations.

<!-- say more about block layout etc... -->

## Back-End

The algorithm to resolve variable definitions used here is based on the paper
"Simple and Efficient Construction of Static Single Assignment Form": https://link.springer.com/content/pdf/10.1007/978-3-642-37051-9_6.pdf.
-->
...

[ssa-blocks]: https://en.wikipedia.org/wiki/Static_single-assignment_form#Block_arguments
[llvm-mlir]: https://mlir.llvm.org/docs/Rationale/Rationale/#block-arguments-vs-phi-nodes

0 comments on commit 76bcbd2

Please sign in to comment.