Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs2 #47

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
135 changes: 135 additions & 0 deletions site/content/docs/how_the_optimizing_compiler_works/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
+++
title = "How the Optimizing Compiler Works"
layout = "single"
+++

What is a JIT compiler?
-----------------------

In general, when we talk about a Just-In-Time (JIT) compiler, we mean a
compilation technique that spares cycles at build-time, trading it for run-time.
In other words, when a language is JIT-compiled, we usually mean that
compilation will happen during run-time. Furthermore, when we use the term
JIT-compilation, we also often mean is that, because compilation happens _during
run-time_, we can use information that we have collected during execution to
direct the compilation process: these types of JIT-compilers are often referred
to as **tracing-JITs**.

Thus, if we wanted to be pedantic, **wazero** provides an **ahead-of-time**,
**load-time** compiler. That is, a compiler that, indeed, performs compilation
at run-time, but only when a WebAssembly module is loaded; it currently does not
collect or leverage any information during the execution of the Wasm binary
itself.

It is important to make such a distinction, because a Just-In-Time compiler may
not be an optimizing compiler, and an optimizing compiler may not be a tracing
JIT. In fact, the compiler that wazero shipped before the introduction of the
new compiler architecture performed code generation at load-time, but did not
perform any optimization.

What is an Optimizing Compiler?
-------------------------------

Wazero supports an _optimizing_ compiler in the style of other optimizing
compilers out there, such as LLVM's or V8's. Traditionally an optimizing
compiler performs compilation in a number of steps.

Compare this to the **old compiler**, where compilation happens in one step or
two, depending on how you count:


```goat
Input +---------------+ +---------------+
Wasm Binary ---->| DecodeModule |---->| CompileModule |----> wazero IR
+---------------+ +---------------+
```

That is, the module is (1) validated then (2) translated to an Intermediate
Representation (IR). The wazero IR can then be executed directly (in the case
of the interpreter) or it can be further processed and translated into native
code by the compiler. This compiler performs a straightforward translation from
the IR to native code, without any further passes. The wazero IR is not intended
for further processing beyond immediate execution or straightforward
translation.

```goat
+---- wazero IR ----+
| |
v v
+--------------+ +--------------+
| Compiler | | Interpreter |- - - executable
+--------------+ +--------------+
|
+----------+---------+
| |
v v
+---------+ +---------+
| ARM64 | | AMD64 |
| Backend | | Backend | - - - - - - - - - executable
+---------+ +---------+
```


Validation and translation to an IR in a compiler are usually called the
**front-end** part of a compiler, while code-generation occurs in what we call
the **back-end** of a compiler. The front-end is the part of a compiler that is
closer to the input, and it generally indicates machine-independent processing,
such as parsing and static validation. The back-end is the part of a compiler
that is closer to the output, and it generally includes machine-specific
procedures, such as code-generation.

In the **optimizing** compiler, we still decode and translate Wasm binaries to
an intermediate representation in the front-end, but we use a textbook
representation called an **SSA** or "Static Single-Assignment Form", that is
intended for further transformation.

The benefit of choosing an IR that is meant for transformation is that a lot of
optimization passes can apply directly to the IR, and thus be
machine-independent. Then the back-end can be relatively simpler, in that it
will only have to deal with machine-specific concerns.

The wazero optimizing compiler implements the following compilation passes:

* Front-End:
- Translation to SSA
- Optimization

* Back-End:
- Instruction Selection
- Registry Allocation
- Finalization and Encoding

```goat
Input +-------------------+ +-------------------+
Wasm Binary --->| DecodeModule |----->| CompileModule |--+
+-------------------+ +-------------------+ |
+----------------------------------------------------------+
|
| +---------------+ +---------------+
+->| Front-End |----------->| Back-End |
+---------------+ +---------------+
| |
v v
SSA Instruction Selection
| |
v v
Optimization Registry Allocation
| |
v v
Block Layout Finalization/Encoding
```

Like the other engines, the implementation can be found under `engine`, specifically
in the `wazevo` sub-package. The entry-point is found under `internal/engine/wazevo/engine.go`,
where the implementation of the interface `wasm.Engine` is found.

All the passes can be dumped to the console for debugging, by enabling, the build-time
flags under `internal/engine/wazevo/wazevoapi/debug_options.go`. The flags are disabled
by default and should only be enabled during debugging. These may also change in the future.

In the following we will assume all paths to be relative to the `internal/engine/wazevo`,
so we will omit the prefix.

<hr>

* Next Section: [Front-End](frontend/)
199 changes: 199 additions & 0 deletions site/content/docs/how_the_optimizing_compiler_works/appendix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
+++
title = "Appendix: Trampolines"
layout = "single"
+++

Trampolines are used to interface between the Go runtime and the generated
code, in two cases:

- when we need to **enter the generated code** from the Go runtime.
- when we need to **leave the generated code** to invoke a host function
(written in Go).

In this section we want to complete the picture of how a Wasm function gets
translated from Wasm to executable code in the optimizing compiler, by
describing how to jump into the execution of the generated code at run-time.

## Entering the Generated Code

At run-time, user space invokes a Wasm function through the public
`api.Function` interface, using methods `Call()` or `CallWithStack()`. The
implementation of this method, in turn, eventually invokes an ASM
**trampoline**. The signature of this trampoline in Go code is:

```go
func entrypoint(
preambleExecutable, functionExecutable *byte,
executionContextPtr uintptr, moduleContextPtr *byte,
paramResultStackPtr *uint64,
goAllocatedStackSlicePtr uintptr)
```

- `preambleExecutable` is a pointer to the generated code for the preamble (see
below)
- `functionExecutable` is a pointer to the generated code for the function (as
described in the previous sections).
- `executionContextPtr` is a raw pointer to the `wazevo.executionContext`
struct. This struct is used to save the state of the Go runtime before
entering or leaving the generated code. It also holds shared state between the
Go runtime and the generated code, such as the exit code that is used to
terminate execution on failure, or suspend it to invoke host functions.
- `moduleContextPtr` is a pointer to the `wazevo.moduleContextOpaque` struct.
This struct Its contents are basically the pointers to the module instance,
specific objects as well as functions. This is sometimes called "VMContext" in
other Wasm runtimes.
- `paramResultStackPtr` is a pointer to the slice where the arguments and
results of the function are passed.
- `goAllocatedStackSlicePtr` is an aligned pointer to the Go-allocated stack
for holding values and call frames. For further details refer to
[/internal/engine/compiler/engine.go][wazero-engine-stack]

The ASM trampoline is guaranteed to follow the stable calling convention
described in [Go's ASM documentation][abi-asm] (sometimes referred to as
[ABI0][proposal-register-cc]) The trampoline can be found in
`backend/isa/<arch>/abi_entry_<arch>.s`.

For each given architecture, the trampoline:
- moves the arguments to some conventional registers that are documented to be
free at the time of the call,
- finally, it jumps into the execution of the generated code for the preamble

The **preamble** is generated distinctly from the rest of the function, and
before it.

This is implemented in `machine.CompileEntryPreamble(*ssa.Signature)`. The
procedure first instantiates a `backend.FunctionABI` struct with metadata about
the expected ABI for a function with a given signature, using the algorithm
outlined in [Go's documentation][abi-cc].

The preamble sets the fields in the `wazevo.executionContext`.

At the beginning of the preamble:

- We set a register to point to the `*wazevo.executionContext` struct.
- we save the stack pointers, frame pointers, return addresses, etc. to that
struct.
- we update the stack pointer to point to `paramResultStackPtr`.

The generated code works in concert with the assumption that the preamble has
been entered through the aforementioned trampoline. Thus, it assumes that the
arguments can be found in some specific registers.

The preamble then assigns the arguments pointed at by `paramResultStackPtr` to
the registers that the generated code expects.

Finally, it invokes the generated code for the function.

The epilogue reverses part of the process, finally returning control to the
caller of the `entrypoint()` function, and the Go runtime. The caller of
`entrypoint()` is also responsible for completing the cleaning up procedure by
invoking `afterGoFunctionCallEntrypoint()` (again, implemented in
backend-specific ASM). which will restore the stack pointers and return
control to the caller of the function.

The arch-specific code can be found in
`backend/isa/<arch>/abi_entry_preamble.go`.

[wazero-engine-stack]: https://github.com/tetratelabs/wazero/blob/095b49f74a5e36ce401b899a0c16de4eeb46c054/internal/engine/compiler/engine.go#L77-L132
[abi-arm64]: https://tip.golang.org/src/cmd/compile/abi-internal#arm64-architecture
[abi-amd64]: https://tip.golang.org/src/cmd/compile/abi-internal#amd64-architecture
[abi-cc]: https://tip.golang.org/src/cmd/compile/abi-internal#function-call-argument-and-result-passing


## Leaving the Generated Code

In "[How do compiler functions work?][how-do-compiler-functions-work]", we
already outlined how _leaving_ the generated code works with the help of a
function. We will complete here the picture by briefly describing the code that
is generated.

When the generated code needs to return control to the Go runtime, it inserts a
meta-instruction that is called `exitSequence` in both `amd64` and `arm64`
backends. This meta-instruction sets the `exitCode` in the
`wazevo.executionContext` struct, restore the stack pointers and then returns
control to the caller of the `entrypoint()` function described above.

As described in "[How do compiler functions
work?][how-do-compiler-functions-work]", the mechanism is essentially the same
when invoking a host function or raising an error. However, when a function is
invoked the `exitCode` also indicates the identifier of the host function to be
invoked.

The magic really happens in the `backend.Machine.CompileGoFunctionTrampoline()`
method. This method is actually invoked when host modules are being
instantiated. It generates a trampoline that is used to invoke such functions
from the generated code.

This trampoline implements essentially the same prologue as the `entrypoint()`,
but it also reserves space for the arguments and results of the function to be
invoked.

A host function has the signature:

```
go func(ctx context.Context, stack []uint64)
```

the function arguments in the `stack` parameter are copied over to the reserved
slots of the real stack. For instance, on `arm64` the stack layout would look
as follows (on `amd64` it would be similar):

```goat
(high address)
SP ------> +-----------------+ <----+
| ....... | |
| ret Y | |
| ....... | |
| ret 0 | |
| arg X | | size_of_arg_ret
| ....... | |
| arg 1 | |
| arg 0 | <----+ <-------- originalArg0Reg
| size_of_arg_ret |
| ReturnAddress |
+-----------------+ <----+
| xxxx | | ;; might be padded to make it 16-byte aligned.
+--->| arg[N]/ret[M] | |
sliceSize| | ............ | | goCallStackSize
| | arg[1]/ret[1] | |
+--->| arg[0]/ret[0] | <----+ <-------- arg0ret0AddrReg
| sliceSize |
| frame_size |
+-----------------+
(low address)
```

Finally, the trampoline jumps into the execution of the host function using the
`exitSequence` meta-instruction.

Upon return, the process is reversed.

## Code

- The trampoline to enter the generated function is implemented by the
`backend.Machine.CompileEntryPreamble()` method.
- The trampoline to return traps and invoke host functions is generated by
`backend.Machine.CompileGoFunctionTrampoline()` method.

You can find arch-specific implementations in
`backend/isa/<arch>/abi_go_call.go`,
`backend/isa/<arch>/abi_entry_preamble.go`, etc. The trampolines are found
under `backend/isa/<arch>/abi_entry_<arch>.s`.

## Further References

- Go's [internal ABI documentation][abi-internal] complements Go's ASM
documentation with details on the internal, unstable ABI, known as
*ABIInternal*. Notice that, however, the calling convention for ASM is
different and described in the ASM documentation.
- Go's [internal ASM documentation][abi-asm] describes the stable, stack-based
calling convention for ASM (_ABI0_).
- Raphael Poss's [The Go low-level calling convention on
x86-64][go-call-conv-x86] is also an excellent reference for `amd64`.

[abi-asm]: https://go.dev/doc/asm
[abi-internal]: https://tip.golang.org/src/cmd/compile/abi-internal
[go-call-conv-x86]: https://dr-knz.net/go-calling-convention-x86-64.html
[proposal-register-cc]: https://go.googlesource.com/proposal/+/master/design/40724-register-calling.md#background
[how-do-compiler-functions-work]: ../../how_do_compiler_functions_work/

Loading