This repo provides convenient methods of using binary artifacts of GHC's wasm backend. It's also a staging area for the GHC wasm backend documentation.
This repo is a nix flake. The default output is a derivation that bundles all provided tools:
$ nix shell https://gitlab.haskell.org/ghc/ghc-wasm-meta/-/archive/master/ghc-wasm-meta-master.tar.gz
$ echo 'main = putStrLn "hello world"' > hello.hs
$ wasm32-wasi-ghc hello.hs -o hello.wasm
[1 of 2] Compiling Main ( hello.hs, hello.o )
[2 of 2] Linking hello.wasm
$ wasmtime ./hello.wasm
hello world
First start will download a bunch of stuff, but won't take too long since it just patches the binaries and performs no compilation. There is no need to set up a binary cache.
Note that there are different GHC flavours available. The default
flake output uses the gmp
flavour, though you can also use the
all_native
, all_unreg
and all_9_6
flake outputs to get different
flavours. See the next subsection for explanations of these flavours.
This repo also provides an installation script that installs the
provided tools to ~/.ghc-wasm
:
$ curl https://gitlab.haskell.org/ghc/ghc-wasm-meta/-/raw/master/bootstrap.sh | sh
...
Everything set up in /home/username/.ghc-wasm.
Run 'source /home/username/.ghc-wasm/env' to add tools to your PATH.
After installing, ~/.ghc-wasm
will contain all the installed tools,
plus:
~/.ghc-wasm/env
which can be sourced into the current shell to add the tools toPATH
, plus all the environment variables needed to buildwasm32-wasi-ghc
~/.ghc-wasm/add_to_github_path.sh
which can be run in GitHub actions, so later steps in the same job can access the same environment variables set byenv
bootstrap.sh
is just a thin wrapper to download the ghc-wasm-meta
repo tarball and invoke setup.sh
. These scripts can be configured
via these environment variables:
PREFIX
: installation destination, defaults to~/.ghc-wasm
FLAVOUR
: can begmp
,native
orunreg
.- The
gmp
flavour uses thegmp
bignum backend and the wasm native codegen. It's the default flavour, offers good compile-time and run-time performance. native
uses thenative
bignum backend and the wasm native codegen. Compared to thegmp
flavour, the run-time performance may be slightly worse if the workload involves bigInteger
operations. May be useful if you are compiling proprietary projects and have concerns about statically linking the LGPL-licensedgmp
library.- The
unreg
flavour uses thegmp
bignum backend and the unregisterised C codegen. Compared to the default flavour, compile-time performance is noticeably worse. May be useful for debugging the native codegen, since there are less GHC test suite failures in the unregisterised codegen at the moment. - The
9.6
/9.8
/9.10
flavour tracks theghc-9.6
/ghc-9.8
/ghc-9.10
release branch instead of themaster
branch. It uses thegmp
bignum backend and the wasm native codegen.
- The
SKIP_GHC
: set this to skip installingcabal
andghc
Note that if you use the 9.6
/9.8
/9.10
flavour, the
wasm32-wasi-cabal
wrapper won't automatically set up head.hackage
in the global config file. In the early days of ghc-9.10
, this may
result in more packages being rejected at compile time. This is true
for both nix/non-nix installation methods.
setup.sh
requires cc
, curl
, jq
, unzip
, zstd
to run.
Besides wasm MVP, certain extensions are used. The feature flags are enabled globally in our wasi-sdk build, passed at GHC configure time, and the wasm NCG may make use of the features. The rationale of post-MVP wasm feature inclusion:
- Supported by default in latest versions of Chrome/Firefox/Safari/wasmtime (check wasm roadmap for details)
- LLVM support has been stable enough (doesn't get into our way when enabled globally)
List of wasm extensions that we use:
- 128-bit packed SIMD
- Non-trapping Float-to-int Conversions
- Sign-extension operators
- Bulk Memory Operations
- Import/Export mutable globals
- Multi-value
- Reference Types
The target triple is wasm32-wasi
, and it uses WASI snapshot 1 as
used in wasi-libc
.
List of wasm extensions that we don't use yet but are keeping an eye on:
- Tail Call, blocked by webkit. Note that we already support wasm tail calls, but it's an opt-in feature for now.
The output .wasm
modules are known to run on these runtimes:
These tools are all packaged in this repo, available in both the nix
flake & setup.sh
installation. The recommended default runtime is
wasmtime
, other ones are included for testing purposes.
deno
, using https://deno.land/std/wasi/snapshot_preview1.ts as WASI implementationnode
, using the builtinwasi
module to provide WASI implementationbun
, using the builtin WASI implementation
Latest releases of Chrome/Firefox/Safari. A JavaScript library is needed to provide the WASI implementation, the following are known to work to some extent:
Note: if you are using the GHC wasm backend to target browsers, we
already support JSFFI in master
as well as upcoming 9.10 releases.
See the relevant
section
in the GHC user's guide for details. The following content is archived
here for the time being, but eventually we'll move all documentation
to the GHC user's guide.
If you want to embed the compiled wasm module into a host language, like in JavaScript for running in a browser, then it's highly likely you want to compile Haskell to a WASI reactor module.
The WASI spec includes certain
syscalls
that are provided as the wasi_snapshot_preview1
wasm imports.
Additionally, the current WASI
ABI
specifies two different kinds of WASI modules: commands and reactors.
A WASI command module is what you get by default when using
wasm32-wasi-ghc
to compile & link a Haskell program. It's called
"command" as in a conventional command-line program, with similar
usage and lifecycle: run it with something like wasmtime
, optionally
passing some arguments and environment variables, it'll run to
completion and probably causing some side effects using whatever
capabilities granted. After it runs to completion, the program state
is finalized.
A WASI reactor module is produced by special link-time flags. It's called "reactor" since it reacts to external calls of its exported functions. Once a reactor module is initialized, the program state is persisted, so if calling an export changes internal state (e.g. sets a global variable), subsequent calls will observe that change.
When linking a program for almost any platform out there, the linker needs to handle ctors(constructors) & dtors(destructors). ctors and dtors are special functions that need to be invoked to correctly initialize/finalize certain runtime state. Even if the user program doesn't use ctors/dtors, as long as the program links to libc, ctors/dtors will need to be handled.
The wasm spec does include a start function. However, due to technical reasons, what a start function can do is rather limited, and may not be sufficient to support ctors/dtors in libc and other places. So the WASI spec needs to address this fact, and the command/reactor distinction arises:
- A WASI command module must export a
_start
function. You can see how_start
is defined inwasi-libc
here. It'll call the ctors, then call the main function in user code, and finally call the dtors. Since the dtors are called, the program state is finalized, so attempting to call any export after this point is undefined behavior! - A WASI reactor module may export an
_initialize
function, if it exists, it must be called exactly once before any other exports are called. See its definition here, it merely calls the ctors. So after_initialize
, you can call the exports freely, reusing the instance state. If you want to "finalize", you're in charge of exporting and calling__wasm_call_dtors
yourself.
The command module works well for wasm modules that are intended to be used like a conventional CLI app. On the otherhand, for more advanced use cases like running in a browser, you almost always want to create a reactor module instead.
Suppose there's a Hello.hs
that has a fib :: Int -> Int
. To invoke
it from the JavaScript host, first you need to write down a foreign export
for it:
foreign export ccall fib :: Int -> Int
GHC will create a C function HsInt fib(HsInt)
that calls into the
actual fib
Haskell function. Now you need to compile and link it
with special flags:
$ wasm32-wasi-ghc Hello.hs -o Hello.wasm -no-hs-main -optl-mexec-model=reactor -optl-Wl,--export=hs_init,--export=myMain
Some explainers:
-no-hs-main
, since we only care about manually exported functions and don't have a defaultmain :: IO ()
-optl-mexec-model=reactor
passes-mexec-model=reactor
toclang
when linking, so it creates a WASI reactor instead of a WASI command-optl-Wl,--export=hs_init,--export=fib
passes the linker flags to exporths_init
andfib
-o Hello.wasm
is necessary, otherwise the output name defaults toa.out
which can be confusing
The flags above also work in the ghc-options
field of a cabal
executable component, see
here
for an example.
Now, here's an example deno
script to load and run Hello.wasm
:
import WasiContext from "https://deno.land/[email protected]/wasi/snapshot_preview1.ts";
const context = new WasiContext({});
const instance = (
await WebAssembly.instantiate(await Deno.readFile("Hello.wasm"), {
wasi_snapshot_preview1: context.exports,
})
).instance;
// The initialize() method will call the module's _initialize export
// under the hood. This is only true for the wasi implementation used
// in this example! If you're using another wasi implementation, do
// read its source code to figure out whether you need to manually
// call the module's _initialize export!
context.initialize(instance);
// This function is a part of GHC's RTS API. It must be called before
// any other exported Haskell functions are called.
instance.exports.hs_init(0, 0);
console.log(instance.exports.fib(10));
For simplicity, we call hs_init
with argc
set to 0
and argv
set to NULL
, assuming we don't use things like
getArgs
/getProgName
in the program. Now, we can call fib
, or any
function with foreign export
and the correct --export=
flag.
Before we add first-class JavaScript interop feature, it's only
possible to use the ccall
calling convention for foreign exports.
It's still possible to exchange large values between Haskell and
JavaScript:
- Add
--export
flag formalloc
/free
. You can now allocate and free linear memory buffers that can be visible to the Haskell world, since the entire linear memory is available as thememory
export. - In the Haskell world, you can pass
Ptr
as foreign export argument/return values. - You can also use
mallocBytes
inForeign.Marshal.Alloc
to allocate buffers in the Haskell world. A buffer allocated bymallocBytes
in Haskell can be passed to JavaScript and be freed by the exportedfree
, and vice versa.
If you pass a buffer malloc
ed in JavaScript into Haskell, before
free
ing that buffer after the exported Haskell function returns,
look into the Haskell code and double check if that buffer is indeed
fully consumed in Haskell! Otherwise, use-after-free bugs await.
Which functions can be exported via the --export
flag?
- Any C function which symbol is externally visible. For libc, there
is a
list
of all externally visible symbols. For the GHC RTS, see
HsFFI.h
andRtsAPI.h
for the functions that you're likely interested in. For instance,hs_init
is inHsFFI.h
, other variants ofhs_init*
is inRtsAPI.h
. - Any Haskell function that has been exported via a
foreign export
declaration.
Further reading:
In addition to the standard wasi_snapshot_preview1
import module,
you can add other import modules to import host JavaScript functions
which can be called in C/Haskell. Of course, the resulting wasm module
is not completely compliant to the WASI spec, and cannot be run in
non-browser runtimes that only supports WASI (like wasmtime
), but
it's not an issue if it's only intended to be run in browsers.
Here's an example:
#include <stdint.h>
void _fill(void *buf, uint32_t len) __attribute__((
__import_module__("env"),
__import_name__("foo")
));
void fill(void *buf, uint32_t len) {
_fill(buf, len);
}
The above assembly source file can be saved as fill.c
, and
compiled/linked with other C/Haskell sources. The fill
function can
be called in C as long as you write down its prototype; it can also be
called in Haskell just like any other normal C function.
Now, when creating the instance, in addition to providing the
wasi_snapshot_preview1
import module, you also need to provide the
env
import module which has a foo
function. Whatever foo
can do
is up to you, reader. It may take a buffer's pointer/length and fill
in something interesting. Since the WASI module has a memory
export,
foo
has access to the instance memory and can do whatever it wants.
Custom imports is a powerful feature and provides more than one way to do things. For instance, if your Haskell program needs to read a large blob, you may either put that blob in a memfs and read it as if reading a file; or you can just feed that blob via a custom import. In this case, a custom import may be simpler, and it further lowers the requirement on the WASI implementation, since not even a virtual memfs implementation will be needed at run-time.
It's even possible for the imported function to invoke Haskell
computation by calling exported Haskell functions! This kind of RTS
re-entrance is permitted in GHC RTS, as long as the custom import is
marked with foreign import ccall safe
.
wizer
is a tool that
takes a wasm module, runs a user-specified initialization function,
then snapshots the wasm instance state into a new wasm module. Since
wizer
is based on wasmtime
, it supports WASI modules out of the
box.
I recommend using wizer
to pre-initialize your WASI reactor module
compiled from Haskell. It's not just about avoiding the overhead of
_initialize
; the initialization function run by wizer
is capable
of much more tasks, including but not limited to:
- Set up custom RTS flags and other command line arguments
- Perform arbitrary Haskell computation
- Perform Haskell garbage collection to re-arrange the heap in an optimal way
It requires a bit of knowledge about GHC's RTS API to write this initialization function, here's an example:
// Including this since we need access to GHC's RTS API. And it
// transitively includes pretty much all of libc headers that we need.
#include <Rts.h>
// When GHC compiles the Test module with foreign export, it'll
// generate Test_stub.h that declares the prototypes for C functions
// that wrap the corresponding Haskell functions.
#include "Test_stub.h"
// The prototype of hs_init_with_rtsopts is "void
// hs_init_with_rtsopts(int *argc, char **argv[])" which is a bit
// cumbersome to work with, hence this convenience wrapper.
STATIC_INLINE void hs_init_with_rtsopts_(char *argv[]) {
int argc;
for (argc = 0; argv[argc] != NULL; ++argc) {
}
hs_init_with_rtsopts(&argc, &argv);
}
void malloc_inspect_all(void (*handler)(void *start, void *end,
size_t used_bytes, void *callback_arg),
void *arg);
static void malloc_inspect_all_handler(void *start, void *end,
size_t used_bytes, void *callback_arg) {
if (used_bytes == 0) {
memset(start, 0, (size_t)end - (size_t)start);
}
}
extern char __stack_low;
extern char __stack_high;
// Export this function as "wizer.initialize". wizer also accepts
// "--init-func <init-func>" if you dislike this export name, or
// prefer to pass -Wl,--export=my_init at link-time.
//
// By the time this function is called, the WASI reactor _initialize
// has already been called by wizer. The export entries of this
// function and _initialize will both be stripped by wizer.
__attribute__((export_name("wizer.initialize"))) void __wizer_initialize(void) {
// The first argument is what you get in getProgName.
//
// -H64m sets the "suggested heap size" to 64MB and reserves so much
// memory when doing GC for the first time. It's not a hard limit,
// the RTS is perfectly capable of growing the heap beyond it, but
// it's still recommended to reserve a reasonably sized heap in the
// beginning. And it doesn't add 64MB to the wizer output, most of
// the grown memory will be zero anyway!
char *argv[] = {"test.wasm", "+RTS", "-H64m", "-RTS", NULL};
// The WASI reactor _initialize function only takes care of
// initializing the libc state. The GHC RTS needs to be initialized
// using one of hs_init* functions before doing any Haskell
// computation.
hs_init_with_rtsopts_(argv);
// Not interesting, I know. The point is you can perform any Haskell
// computation here! Or C/C++, whatever.
fib(10);
// Perform major GC to clean up the heap. The second run will invoke
// the C finalizers found during the first run.
hs_perform_gc();
hs_perform_gc();
// Zero out the unused RTS memory, to prevent the garbage bytes from
// being snapshotted into the final wasm module. Otherwise it
// wouldn't affect correctness, but the wasm module size would bloat
// significantly. It's only safe to call this after hs_perform_gc()
// has returned.
rts_clearMemory();
// Zero out the unused heap space. `malloc_inspect_all` is a
// dlmalloc internal function which traverses the heap space and can
// be used to zero out some space that's previously allocated and
// then freed. Upstream `wasi-libc` doesn't expose this function
// yet, we do since it's useful for this specific purpose.
malloc_inspect_all(malloc_inspect_all_handler, NULL);
// Zero out the entire stack region in the linear memory. This is
// only suitable to do after all other cleanup has been done and
// we're about to exit `__wizer_initialize`. `__stack_low` and
// `__stack_high` are linker generated symbols which resolve to the
// two ends of the stack region.
memset(&__stack_low, 0, &__stack_high - &__stack_low);
}
Then you can compile & link the C code above with a regular Haskell
module, and pre-initialize using wizer
:
wasm32-wasi-ghc test.hs test_c.c -o test.wasm -no-hs-main -optl-mexec-model=reactor -optl-Wl,--export=fib
wizer --allow-wasi --wasm-bulk-memory true test.wasm -o test.wizer.wasm
Note that test.wizer.wasm
will be slightly larger than test.wasm
,
which is expected behavior here, given some computation has already
been run and the linear memory captures more runtime data.
If you run wasm-opt
to minimize the wasm
module, it's recommend to
only run it for the wizer
output. wasm-opt
will be able to strip
away some unused initialization functions that are no longer reachable
via wasm exports or function table.
By default, only stdin/stdout/stderr is supported. To access the host
file system, one needs to map the allowed root directory into /
as a
WASI preopen.
The initial working directoy is always /
. If you'd like to specify
another working directory other than /
in the virtual filesystem,
use the PWD
environment variable. This is not a WASI standard, just
a workaround we implemented in the GHC RTS shall the need arises.
If you intend to build the GHC's wasm backend, here's a building guide, assuming you already have some experience with building native GHC.
To build the wasm backend, the systemwide C/C++ toolchain won't work.
You need to install our wasi-sdk
fork. Upstream wasi-sdk
won't work yet.
If your host system is one of {x86_64,aarch64}-{linux,darwin}
, then
you can avoid building and simply download & extract the GitLab CI
artifact from the latest master
commit. The linux artifacts are
statically linked and work out of the box on all distros; the macos
artifact contains universal binaries and works on either apple silicon
or intel mac.
If your host system is x86_64-linux
, you can use the setup.sh
script to install it, as documented in previous sections.
For simplicity, the following subsections all assume wasi-sdk
is
installed to ~/.ghc-wasm/wasi-sdk
, and
~/.ghc-wasm/wasi-sdk/bin/wasm32-wasi-clang
works.
Skip this subsection if wasi-sdk
is installed by setup.sh
instead
of extracting CI artifacts directly.
Extract the CI artifact of
libffi-wasm
, and copy
its contents:
cp *.h ~/.ghc-wasm/wasi-sdk/share/wasi-sysroot/include/wasm32-wasi
cp *.a ~/.ghc-wasm/wasi-sdk/share/wasi-sysroot/lib/wasm32-wasi
If you used setup.sh
to install wasi-sdk
/libffi-wasm
, then you
can source ~/.ghc-wasm/env
into your current shell to set the
following environment variables required for building. Otherwise, you
can save this to somewhere else and source that script.
export AR=~/.ghc-wasm/wasi-sdk/bin/llvm-ar
export CC=~/.ghc-wasm/wasi-sdk/bin/wasm32-wasi-clang
export CC_FOR_BUILD=cc
export CXX=~/.ghc-wasm/wasi-sdk/bin/wasm32-wasi-clang++
export LD=~/.ghc-wasm/wasi-sdk/bin/wasm-ld
export NM=~/.ghc-wasm/wasi-sdk/bin/llvm-nm
export OBJCOPY=~/.ghc-wasm/wasi-sdk/bin/llvm-objcopy
export OBJDUMP=~/.ghc-wasm/wasi-sdk/bin/llvm-objdump
export RANLIB=~/.ghc-wasm/wasi-sdk/bin/llvm-ranlib
export SIZE=~/.ghc-wasm/wasi-sdk/bin/llvm-size
export STRINGS=~/.ghc-wasm/wasi-sdk/bin/llvm-strings
export STRIP=~/.ghc-wasm/wasi-sdk/bin/llvm-strip
export CONF_CC_OPTS_STAGE2="-fno-strict-aliasing -Wno-error=implicit-function-declaration -Wno-error=int-conversion -O3 -msimd128 -mnontrapping-fptoint -msign-ext -mbulk-memory -mmutable-globals -mmultivalue -mreference-types"
export CONF_CXX_OPTS_STAGE2="-fno-exceptions -fno-strict-aliasing -Wno-error=implicit-function-declaration -Wno-error=int-conversion -O3 -msimd128 -mnontrapping-fptoint -msign-ext -mbulk-memory -mmutable-globals -mmultivalue -mreference-types"
export CONF_GCC_LINKER_OPTS_STAGE2="-Wl,--compress-relocations,--error-limit=0,--growable-table,--keep-section=ghc_wasm_jsffi,--stack-first,--strip-debug "
export CONFIGURE_ARGS="--target=wasm32-wasi --with-intree-gmp --with-system-libffi"
Checkout GHC.
The rest is the usual boot & configure & build process. You need to
ensure the environment variables described earlier are correctly set
up; for ghc.nix
users, it sets up a default CONFIGURE_ARGS
in the
nix-shell which is incompatible, and the env
script set up by
setup.sh
respects existing CONFIGURE_ARGS
, so don't forget to
unset it first!
Configure with ./configure $CONFIGURE_ARGS
, then build with hadrian.
After the build completes, you can compile stuff to wasm using
_build/stage1/bin/wasm32-wasi-ghc
.
Happy hacking!
If you suspect there's something wrong that needs fixing in the wasm
backend (e.g. runtime crashes), you're more than welcome to open a
ticket in the GHC issue tracker! The wasm
tag will be added to the
ticket by one of the triagers, which ensures it ends up in my inbox.
It would be very nice to have a self-contained Haskell source file to reproduce the bug, but it's not a strict necessity to report the bug. It's fine as long as the project source code and build instruction is available.
Do include the following info in the ticket:
- Which
ghc-wasm-meta
revision is used to installwasm32-wasi-ghc
? Or if you're building it yourself, which GHC revision are you using?
To get some more insight on the bug, there are a few things worth checking, importance ranked from high to low:
- Try a different runtime environment. If it is a WASI command module
(self-contained
.wasm
file), does it work inwasmtime
or some other runtime? If it is meant for the browser, try switching to a different WASI implementation, does the same bug still occur? It's known that all current JavaScript implementations of WASI are not 100% feature complete and may have bugs. - Try a different optimization level in GHC. If it's a cabal project,
you can configure the
optimization:
field incabal.project
. In case of a code generation bug, it may be the case that one of the passes during GHC optimization resulted in buggy code. - Try enabling
-dlint
at compile-time to check consistency. - If it crashes, make it crash as early/verbose as possible. Use the
GHC
-debug
flag at link time to link with the debug RTS that enables certain internal checks. On top of that, pass RTS flags to enable certain checks, e.g.+RTS -DS
for sanity checks, see here for details.