-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate WASM "calling conventions" and passing non-scalar datatypes like strings #106
Comments
wasm UDF communicationPassing native primitive datatypes ( WITIn the long run, WebAssembly Interface Types (
import codeWIT-generated calling code, in our case run by the seafowl process. #[allow(clippy::all)]
mod input {
pub fn upper(s: & str,) -> String{
unsafe {
let vec0 = s;
let ptr0 = vec0.as_ptr() as i32;
let len0 = vec0.len() as i32;
#[repr(align(4))]
struct __InputRetArea([u8; 8]);
let mut __input_ret_area: __InputRetArea = __InputRetArea([0; 8]);
let ptr1 = __input_ret_area.0.as_mut_ptr() as i32;
#[link(wasm_import_module = "input")]
extern "C" {
#[cfg_attr(target_arch = "wasm32", link_name = "upper: func(s: string) -> string")]
#[cfg_attr(not(target_arch = "wasm32"), link_name = "input_upper: func(s: string) -> string")]
fn wit_import(_: i32, _: i32, _: i32, );
}
wit_import(ptr0, len0, ptr1);
let len2 = *((ptr1 + 4) as *const i32) as usize;
String::from_utf8(Vec::from_raw_parts(*((ptr1 + 0) as *const i32) as *mut _, len2, len2)).unwrap()
}
}
} export codeWIT-generated wrapper around guest code (in our case the UDF). #[allow(clippy::all)]
mod input {
#[export_name = "upper: func(s: string) -> string"]
unsafe extern "C" fn __wit_bindgen_input_upper(arg0: i32, arg1: i32, ) -> i32{
let len0 = arg1 as usize;
let result1 = <super::Input as Input>::upper(String::from_utf8(Vec::from_raw_parts(arg0 as *mut _, len0, len0)).unwrap());
let ptr2 = __INPUT_RET_AREA.0.as_mut_ptr() as i32;
let vec3 = (result1.into_bytes()).into_boxed_slice();
let ptr3 = vec3.as_ptr() as i32;
let len3 = vec3.len() as i32;
core::mem::forget(vec3);
*((ptr2 + 4) as *mut i32) = len3;
*((ptr2 + 0) as *mut i32) = ptr3;
ptr2
}
#[export_name = "cabi_post_upper"]
unsafe extern "C" fn __wit_bindgen_input_upper_post_return(arg0: i32, ) {
wit_bindgen_guest_rust::rt::dealloc(*((arg0 + 0) as *const i32), (*((arg0 + 4) as *const i32)) as usize, 1);
}
#[repr(align(4))]
struct __InputRetArea([u8; 8]);
static mut __INPUT_RET_AREA: __InputRetArea = __InputRetArea([0; 8]);
pub trait Input {
fn upper(s: String,) -> String;
}
} There exists a very early pre-alpha WIT implementation for rust supporting both rust hosts and WASM guests. The developers urge everyone interested in using this in production to hold their horses and look for other alternatives while the WIT standard is finalized, I'd guess somewhere between 12 - 18 months from now. Alternatives until WIT can be usedPassing raw stringsThe least ambitious, but by no means easiest approach is to extend the existing integer and float types currently supported in seafowl UDFs with strings. Not only would this provide support for using CHAR, TEXT, VARCHAR types in UDFs, more complex data structures could be submitted as serialized strings using JSON, MessagePack, CBOR, etc. I wrote example proof of concept The complexity stems from the following:
If strings aren't necessary UTF-8 string, but rather MessagePack-encoded streams of values, then all of the function arguments could be encoded in a single string, resulting in a simplified UDF WASM function signature:
Where the result is a pointer to a pascal-style string like in the WIT-generated code. WaPCThe waPC project attempts to simplify wasm host-guest RPC. They provide a rust host and a number of supported guest languages. WaPC has its own GraphQL-inspired IDL language (WIDL). Based on GitHub activity, it seems to be an active project but lacks significant backing (written and mostly by 3 guys at a startup called Vino until recently). Links to step-by-step tutorials are all broken. WaPC uses MessagePack to serialize data by default. WASM-bindgenAs a name that kept coming up during my research, wasm-bindgen deserves a mention. Its a mature solution for WASM RPC, but unfortunately limited to JavaScript host -> Rust WASM module guest calls. There was experimental support for WIT, but its not longer supported. In a future where WIT support returns, WASI-based communicationThe WebAssembly System Interface is an extension to WASM providing an interface to module functions for interacting with the host filesystem, command line arguments, environment variables, etc. RecommendationEveryone -including myself- looks upon WIT as the "ultimate" solution to WASM RPC. Unfortunately, when WIT stabilizes is anyone's guess. The good news is that we don't have to commit to a single UDF interface for all time. Seafowl already expects a If the overhead of using WASI is acceptable, reading serialized input from For "normal" UDFs, the input consists of a tuple of supported arrow types, so the serialized input could look something like this:
|
Currently, our WASM functions only support passing basic types like ints and floats. In order to be able to pass something more complex like strings or datetimes, we want to put them in the WASM memory and point the UDF to it.
We need to figure out what is the most ergonomic way to the function writer to do this. For reference, something like this:
compiles to:
This should work out of the box, without having to write a wrapper that converts some binary representation into a C string.
The text was updated successfully, but these errors were encountered: