evacchi · evacchi · Feb 12, 2024 · Feb 13, 2024 · Feb 13, 2024 · Feb 13, 2024
diff --git a/site/content/docs/how_the_optimizing_compiler_works/_index.md b/site/content/docs/how_the_optimizing_compiler_works/_index.md
@@ -0,0 +1,135 @@
++++
+title = "How the Optimizing Compiler Works"
+layout = "single"
++++
+
+What is a JIT compiler?
+-----------------------
+
+In general, when we talk about a Just-In-Time (JIT) compiler, we mean a
+compilation technique that spares cycles at build-time, trading it for run-time.
+In other words, when a language is JIT-compiled, we usually mean that
+compilation will happen during run-time. Furthermore, when we use the term
+JIT-compilation, we also often mean is that, because compilation happens _during
+run-time_, we can use information that we have collected during execution to
+direct the compilation process: these types of JIT-compilers are often referred
+to as **tracing-JITs**.
+
+Thus, if we wanted to be pedantic, **wazero** provides an **ahead-of-time**,
+**load-time** compiler. That is, a compiler that, indeed, performs compilation
+at run-time, but only when a WebAssembly module is loaded; it currently does not
+collect or leverage any information during the execution of the Wasm binary
+itself.
+
+It is important to make such a distinction, because a Just-In-Time compiler may
+not be an optimizing compiler, and an optimizing compiler may not be a tracing
+JIT. In fact, the compiler that wazero shipped before the introduction of the
+new compiler architecture performed code generation at load-time, but did not
+perform any optimization.
+
+What is an Optimizing Compiler?
+-------------------------------
+
+Wazero supports an _optimizing_ compiler in the style of other optimizing
+compilers out there, such as LLVM's or V8's. Traditionally an optimizing
+compiler performs compilation in a number of steps.
+
+Compare this to the **old compiler**, where compilation happens in one step or
+two, depending on how you count:
+
+
+```goat
+    Input         +---------------+     +---------------+
+ Wasm Binary ---->| DecodeModule  |---->| CompileModule |----> wazero IR
+                  +---------------+     +---------------+
+```
+
+That is, the module is (1) validated then (2) translated to an Intermediate
+Representation (IR).  The wazero IR can then be executed directly (in the case
+of the interpreter) or it can be further processed and translated into native
+code by the compiler. This compiler performs a straightforward translation from
+the IR to native code, without any further passes. The wazero IR is not intended
+for further processing beyond immediate execution or straightforward
+translation.
+
+```goat
+                +----   wazero IR    ----+
+                |                        |
+                v                        v
+        +--------------+         +--------------+
+        |   Compiler   |         | Interpreter  |- - -  executable
+        +--------------+         +--------------+
+                |
+     +----------+---------+
+     |                    |
+     v                    v
++---------+          +---------+
+|  ARM64  |          |  AMD64  |
+| Backend |          | Backend |    - - - - - - - - -   executable
++---------+          +---------+
+```
+
+
+Validation and translation to an IR in a compiler are usually called the
+**front-end** part of a compiler, while code-generation occurs in what we call
+the **back-end** of a compiler. The front-end is the part of a compiler that is
+closer to the input, and it generally indicates machine-independent processing,
+such as parsing and static validation. The back-end is the part of a compiler
+that is closer to the output, and it generally includes machine-specific
+procedures, such as code-generation.
+
+In the **optimizing** compiler, we still decode and translate Wasm binaries to
+an intermediate representation in the front-end, but we use a textbook
+representation called an **SSA** or "Static Single-Assignment Form", that is
+intended for further transformation.
+
+The benefit of choosing an IR that is meant for transformation is that a lot of
+optimization passes can apply directly to the IR, and thus be
+machine-independent. Then the back-end can be relatively simpler, in that it
+will only have to deal with machine-specific concerns.
+
+The wazero optimizing compiler implements the following compilation passes:
+
+* Front-End:
+  - Translation to SSA
+  - Optimization
+
+* Back-End:
+  - Instruction Selection
+  - Registry Allocation
+  - Finalization and Encoding
+
+```goat
+     Input          +-------------------+      +-------------------+
+  Wasm Binary   --->|   DecodeModule    |----->|   CompileModule   |--+
+                    +-------------------+      +-------------------+  |
+           +----------------------------------------------------------+
+           |
+           |  +---------------+            +---------------+
+           +->|   Front-End   |----------->|   Back-End    |
+              +---------------+            +---------------+
+                      |                            |
+                      v                            v
+                     SSA                 Instruction Selection
+                      |                            |
+                      v                            v
+                Optimization              Registry Allocation
+                      |                            |
+                      v                            v
+                Block Layout             Finalization/Encoding
+```
+
+Like the other engines, the implementation can be found under `engine`, specifically
+in the `wazevo` sub-package. The entry-point is found under `internal/engine/wazevo/engine.go`,
+where the implementation of the interface `wasm.Engine` is found.
+
+All the passes can be dumped to the console for debugging, by enabling, the build-time
+flags under `internal/engine/wazevo/wazevoapi/debug_options.go`. The flags are disabled
+by default and should only be enabled during debugging. These may also change in the future.
+
+In the following we will assume all paths to be relative to the `internal/engine/wazevo`,
+so we will omit the prefix.
+
+<hr>
+
+* Next Section: [Front-End](frontend/)
diff --git a/site/content/docs/how_the_optimizing_compiler_works/appendix.md b/site/content/docs/how_the_optimizing_compiler_works/appendix.md
@@ -0,0 +1,199 @@
++++
+title = "Appendix: Trampolines"
+layout = "single"
++++
+
+Trampolines are used to interface between the Go runtime and the generated
+code, in two cases:
+
+- when we need to **enter the generated code** from the Go runtime.
+- when we need to **leave the generated code** to invoke a host function
+  (written in Go).
+
+In this section we want to complete the picture of how a Wasm function gets
+translated from Wasm to executable code in the optimizing compiler, by
+describing how to jump into the execution of the generated code at run-time.
+
+## Entering the Generated Code
+
+At run-time, user space invokes a Wasm function through the public
+`api.Function` interface, using methods `Call()` or `CallWithStack()`.  The
+implementation of this method, in turn, eventually invokes an ASM
+**trampoline**. The signature of this trampoline in Go code is:
+
+```go
+func entrypoint(
+	preambleExecutable, functionExecutable *byte,
+	executionContextPtr uintptr, moduleContextPtr *byte,
+	paramResultStackPtr *uint64,
+	goAllocatedStackSlicePtr uintptr)
+```
+
+- `preambleExecutable` is a pointer to the generated code for the preamble (see
+  below)
+- `functionExecutable` is a pointer to the generated code for the function (as
+  described in the previous sections).
+- `executionContextPtr` is a raw pointer to the `wazevo.executionContext`
+  struct. This struct is used to save the state of the Go runtime before
+entering or leaving the generated code. It also holds shared state between the
+Go runtime and the generated code, such as the exit code that is used to
+terminate execution on failure, or suspend it to invoke host functions.
+- `moduleContextPtr` is a pointer to the `wazevo.moduleContextOpaque` struct.
+  This struct Its contents are basically the pointers to the module instance,
+specific objects as well as functions. This is sometimes called "VMContext" in
+other Wasm runtimes.
+- `paramResultStackPtr` is a pointer to the slice where the arguments and
+  results of the function are passed.
+- `goAllocatedStackSlicePtr` is an aligned pointer to the Go-allocated stack
+  for holding values and call frames. For further details refer to
+[/internal/engine/compiler/engine.go][wazero-engine-stack]
+
+The ASM trampoline is guaranteed to follow the stable calling convention
+described in [Go's ASM documentation][abi-asm] (sometimes referred to as
+[ABI0][proposal-register-cc]) The trampoline can be found in
+`backend/isa/<arch>/abi_entry_<arch>.s`.
+
+For each given architecture, the trampoline:
+- moves the arguments to some conventional registers that are documented to be
+  free at the time of the call,
+- finally, it jumps into the execution of the generated code for the preamble
+
+The **preamble** is generated distinctly from the rest of the function, and
+before it.
+
+This is implemented in `machine.CompileEntryPreamble(*ssa.Signature)`.  The
+procedure first instantiates a `backend.FunctionABI` struct with metadata about
+the expected ABI for a function with a given signature, using the algorithm
+outlined in [Go's documentation][abi-cc].
+
+The preamble sets the fields in the `wazevo.executionContext`.
+
+At the beginning of the preamble:
+
+- We set a register to point to the `*wazevo.executionContext` struct.
+- we save the stack pointers, frame pointers, return addresses, etc. to that
+  struct.
+- we update the stack pointer to point to `paramResultStackPtr`.
+
+The generated code works in concert with the assumption that the preamble has
+been entered through the aforementioned trampoline. Thus, it assumes that the
+arguments can be found in some specific registers.
+
+The preamble then assigns the arguments pointed at by `paramResultStackPtr` to
+the registers that the generated code expects.
+
+Finally, it invokes the generated code for the function.
+
+The epilogue reverses part of the process, finally returning control to the
+caller of the `entrypoint()` function, and the Go runtime. The caller of
+`entrypoint()` is also responsible for completing the cleaning up procedure by
+invoking `afterGoFunctionCallEntrypoint()` (again, implemented in
+backend-specific ASM).  which will restore the stack pointers and return
+control to the caller of the function.
+
+The arch-specific code can be found in
+`backend/isa/<arch>/abi_entry_preamble.go`.
+
+[wazero-engine-stack]: https://github.com/tetratelabs/wazero/blob/095b49f74a5e36ce401b899a0c16de4eeb46c054/internal/engine/compiler/engine.go#L77-L132
+[abi-arm64]: https://tip.golang.org/src/cmd/compile/abi-internal#arm64-architecture
+[abi-amd64]: https://tip.golang.org/src/cmd/compile/abi-internal#amd64-architecture
+[abi-cc]: https://tip.golang.org/src/cmd/compile/abi-internal#function-call-argument-and-result-passing
+
+
+## Leaving the Generated Code
+
+In "[How do compiler functions work?][how-do-compiler-functions-work]", we
+already outlined how _leaving_ the generated code works with the help of a
+function. We will complete here the picture by briefly describing the code that
+is generated.
+
+When the generated code needs to return control to the Go runtime, it inserts a
+meta-instruction that is called `exitSequence` in both `amd64` and `arm64`
+backends.  This meta-instruction sets the `exitCode` in the
+`wazevo.executionContext` struct, restore the stack pointers and then returns
+control to the caller of the `entrypoint()` function described above.
+
+As described in "[How do compiler functions
+work?][how-do-compiler-functions-work]", the mechanism is essentially the same
+when invoking a host function or raising an error. However, when a function is
+invoked the `exitCode` also indicates the identifier of the host function to be
+invoked.
+
+The magic really happens in the `backend.Machine.CompileGoFunctionTrampoline()`
+method.  This method is actually invoked when host modules are being
+instantiated.  It generates a trampoline that is used to invoke such functions
+from the generated code.
+
+This trampoline implements essentially the same prologue as the `entrypoint()`,
+but it also reserves space for the arguments and results of the function to be
+invoked.
+
+A host function has the signature:
+
+```
+go func(ctx context.Context, stack []uint64) 
+```
+
+the function arguments in the `stack` parameter are copied over to the reserved
+slots of the real stack. For instance, on `arm64` the stack layout would look
+as follows (on `amd64` it would be similar):
+
+```goat
+                  (high address)
+    SP ------> +-----------------+  <----+
+               |     .......     |       |
+               |      ret Y      |       |
+               |     .......     |       |
+               |      ret 0      |       |
+               |      arg X      |       |  size_of_arg_ret
+               |     .......     |       |
+               |      arg 1      |       |
+               |      arg 0      |  <----+ <-------- originalArg0Reg
+               | size_of_arg_ret |
+               |  ReturnAddress  |
+               +-----------------+ <----+
+               |      xxxx       |      |  ;; might be padded to make it 16-byte aligned.
+          +--->|  arg[N]/ret[M]  |      |
+ sliceSize|    |   ............  |      | goCallStackSize
+          |    |  arg[1]/ret[1]  |      |
+          +--->|  arg[0]/ret[0]  | <----+ <-------- arg0ret0AddrReg
+               |    sliceSize    |
+               |   frame_size    |
+               +-----------------+
+                  (low address)
+```
+
+Finally, the trampoline jumps into the execution of the host function using the
+`exitSequence` meta-instruction.
+
+Upon return, the process is reversed.
+
+## Code
+
+- The trampoline to enter the generated function is implemented by the
+  `backend.Machine.CompileEntryPreamble()` method.
+- The trampoline to return traps and invoke host functions is generated by
+  `backend.Machine.CompileGoFunctionTrampoline()` method.
+
+You can find arch-specific implementations in
+`backend/isa/<arch>/abi_go_call.go`,
+`backend/isa/<arch>/abi_entry_preamble.go`, etc. The trampolines are found
+under `backend/isa/<arch>/abi_entry_<arch>.s`.
+
+## Further References
+
+- Go's [internal ABI documentation][abi-internal] complements Go's ASM
+  documentation with details on the internal, unstable ABI, known as
+*ABIInternal*. Notice that, however, the calling convention for ASM is
+different and described in the ASM documentation.
+- Go's [internal ASM documentation][abi-asm] describes the stable, stack-based
+  calling convention for ASM (_ABI0_).
+- Raphael Poss's [The Go low-level calling convention on
+  x86-64][go-call-conv-x86] is also an excellent reference for `amd64`.
+
+[abi-asm]: https://go.dev/doc/asm
+[abi-internal]: https://tip.golang.org/src/cmd/compile/abi-internal
+[go-call-conv-x86]: https://dr-knz.net/go-calling-convention-x86-64.html
+[proposal-register-cc]: https://go.googlesource.com/proposal/+/master/design/40724-register-calling.md#background
+[how-do-compiler-functions-work]: ../../how_do_compiler_functions_work/
+