The Idea
Last year, towards the end of summer, I found myself without a project at work. Since I perceived an uptick in companies looking for Go
developers, improving
my knowledge in that space seemed like a good idea. Just following introductory tutorials is not really my thing, so I began looking for an idea.
At the time, I was doing the daily Leetcode problem. So I went in a similar direction, but with a focus on reviewable code instead of automatic correctness checks. The result was this unfinished application.
Due to my approach, I didn’t need to actually check if the submitted code was working or not. But it did make me think about how I would do that. Coincidentally, I spent some time looking into WebAssembly earlier that year …
Cutting to the chase: The idea is utilizing WebAssembly’s inherently sandboxed execution environment to run on-demand compiled
code.
WebAssembly
Let’s learn about WebAssembly before we get going and look at the official definition:
- webassembly.org
Aha!
Now that everything is clear … alright maybe we should unwrap what that means first.
Most programming languages are converted from their human-readable representation into machine-executable code. This involves a couple of steps, but towards the end of the process, assembly code is assembled into machine code by an assembler. This assembly code is specific to the underlying CPU architecture (because different architectures use different instruction sets).
Wasm, on the other hand, doesn’t run on the CPU directly. Instead, a stack-based virtual machine (though thats a point of contention) is used
to execute the Wasm code. This makes Wasm effectively platform agnostic. Modern browsers ship with embedded runtimes (e.g. Chrome’s
V8), which enable Wasm execution and interoperability with Javascript
. That means you can call Wasm functions from your
client-side frontend code and handle more computationally intensive tasks in a high-performance environment (near-native speed, if you want the marketing phrasing).
While WebAssembly is, as the name implies, first and foremost a web standard, it can also be executed outside the browser, using standalone runtimes like Wasmtime. This is great for what we’re looking
to do, because Wasm is also a sandboxed execution environment
, meaning the guest environment can’t access underlying system resources of the host. This restriction can be somewhat softened by WASI,
the WebAssembly System Interface, extending Wasm’s capabilities to be able to interact with the host through carefully defined interfaces.
But how do we write Wasm code, if it’s designed as a binary format? Good question! While you could write it by hand, using it’s text format, there’s an easier way.
Wasm is a compilation target for a lot of programming languages. So you can write the code in your favorite language and compile it to one of the available Wasm targets (e.g. wasm32-wasip1
).
Limitations
During the exploration of this topic, I’ve encountered some issues along the way. Most of them I could overcome, some are inherent to working with WebAssembly and some I just had to just live with for now.
Types
Wasm only supports a very minimal set of value types (i32
, i64
, f32
, f64
). That’s not a lot to work with, and we’ll have to be a bit “creative”, for lack of a better term, to still do what we want.
There are libraries that help with overcoming these restrictions, but for the purpose of learning, I avoided using those.
Multi-Value
There is Wasm support for multi-value returns, e.g. having a function return two or more i32
. I tried getting this to work, as it would have been very useful, but even though compilation ran without errors, the
resulting Wasm was missing the correct function definitons. It appears that, while the documentation for Wasmtime
still has examples for multi-value use, that functionality is currently not supported with Rust
(worded a bit confusingly here).
Implementation
I’m doing this in Rust, but the concepts translate to other languages, so feel free to replicate it using Go or any other supported language, if that’s your jam. Shown code only highlights the relevant parts for Wasm execution, while the full project also utilizes Axum to provide the demo at the bottom of this post.
Complete code can be found in my GitHub.
Boilerplate
Let’s start with some type definitions. We’ll need an enum of supported function variants, a struct that contains the stringified user input
and the variant of Function
that we want to run and an output struct with a
#[derive(Deserialize, Serialize)]
#[serde(rename_all = "snake_case")]
enum Function {
Arbitrary,
Param,
}
#[derive(Serialize)]
pub struct ExecutionResult {
log: Option<String>,
out: Option<String>,
}
pub struct CodeSubmission {
user_input: String,
function: Function,
}
pub fn main() {
let payload = CodeSubmission {
user_input: String::from("pub fn execute() -> String { return String::from(\"Hello World!\");}"),
function: Function::Arbitrary,
}
let res = match compile_and_run_wasm(&payload) {
Ok(result) => result,
Err(err) => ExecutionResult {
log: Some(format!("Error: {}", err)),
out: None,
},
};
// do something with the result, e.g. print
}
Compilation
The given code has to be compiled to a .wasm
binary at runtime. If the code you want to compile is dependency free, i.e. it’s only using the standard library, this can be pretty straightforward.
You’d just use rustc
, specify your source file and an output file, set the target to wasm32-wasip1
and voilà, .wasm
file ready to use.
But, due to the limitations of Wasm’s type system, we require the support of serde and serde_json
a little later.
I’ve tried getting this to work with rustc
directly, precompiling the dependencies to Wasm and linking them manually.
But for whatever reason it didn’t want to work. Additionally, that strategy doesn’t scale very well if you ever need a lot of dependencies (although I’d wager
if you need more than a few dependencies, the software you’re building is probably not supposed to be compiled on-demand).
My solution for this problem was to create a /templates
folder in the project root, that contains some template cargo
projects (one per Function
variant).
We’re going to copy those into temporary directories (using tempdir), choosing the template that matches
the requested function, write the input code to it and adjust the package name in the cargo.toml
.
fn compile_and_run_wasm(payload: &CodeSubmission) -> Result<ExecutionResult, anyhow::Error> {
let temp_dir = tempdir()?;
let template = format!(
"templates/{}",
json!(&payload.function)
.as_str()
.ok_or(anyhow!("Error deserializing function enum"))?
);
let src_dir = Path::new(&template);
let dst_dir = temp_dir.path();
copy_template(src_dir, dst_dir)?;
write_file(&dst_dir.join("src/scaffold.rs"), &payload.user_input)?
let unique_name = format!(
"user{}",
dst_dir
.file_name()
.ok_or(anyhow!("Error reading generated temp name."))?
.to_str()
.ok_or(anyhow!("Error converting temp name to string"))?
)
.replace(".", "_");
customize_cargo(&dst_dir.join("cargo.toml"), &unique_name)?;
// ...
}
// write user code to file
fn write_file(path: &Path, source_code: &str) -> Result<(), anyhow::Error> {
let mut file = OpenOptions::new()
.create(true)
.write(true)
.truncate(true)
.open(path)?;
write!(file, "{}", source_code)?;
Ok(())
}
// appends the name attribute to the cargo.toml in the temp dir
fn customize_cargo(path: &Path, unique_name: &str) -> Result<(), anyhow::Error> {
let mut file = OpenOptions::new().append(true).open(path)?;
writeln!(file, "name = \"{}\"", unique_name)?;
Ok(())
}
// recursively copies a directory and its content to the destination
fn copy_template(src: &Path, dst: &Path) -> Result<(), anyhow::Error> {
if !dst.exists() {
fs::create_dir_all(dst)?;
}
for entry in fs::read_dir(src)? {
let entry = entry?;
let src_path = entry.path();
let dst_path = dst.join(entry.file_name());
let ty = entry.file_type()?;
if ty.is_dir() {
copy_template(&src_path, &dst_path)?;
} else {
fs::copy(src_path, dst_path)?;
}
}
Ok(())
}
This project is then compiled using cargo
, which takes care of the dependency linking for us, and the output is saved to the --target-dir
specified in the command.
That way, we effectively cache the compilation result for the dependencies and avoid recompiling them every time.
fn compile_and_run_wasm(payload: &CodeSubmission) -> Result<ExecutionResult, anyhow::Error> {
// ...
let target_dir = env::current_dir()?.join("target");
let output = Command::new("cargo")
.args([
"build",
"--release",
"--target",
"wasm32-wasip1",
"--target-dir",
target_dir
.to_str()
.ok_or(anyhow!("Failed to convert target directory to str."))?,
])
.current_dir(dst_dir)
.output()?;
if !output.status.success() {
return Err(anyhow!(
"Compilation failed: {}",
String::from_utf8_lossy(&output.stderr)
));
}
run_wasm(&unique_name, &payload)
}
Templates
Let’s take a look at the templates
folder. I’ll just show a minimal example here, so that following code makes sense, and expand on it later.
The directory is structured like this:
/templates
/arbitrary
/src
lib.rs
scaffold.rs
Cargo.toml
/param
...
And the file content in e.g. the arbitrary
template:
// lib.rs
#[no_mangle]
pub extern "C" fn run() -> *const i32 {
let message = format!("{:?}", execute());
let ptr = message.as_ptr() as i32;
let length = message.len() as i32;
std::mem::forget(message);
let res = Box::new([ptr, length]);
Box::into_raw(res) as *const i32
}
// scaffold.rs
pub fn execute() -> impl fmt::Debug {
String::from("Hello from Wasm!")
}
We define a public function that follows the C calling conventions (extern "C"
) with #[no_mangle]
, so that we can later call that function from Rust under it’s correct (unmangled) name. In this case run
.
This function returns *const i32
, a raw pointer to a 32 bit integer. Remember, we are limited in what types we can return from Wasm.
We then call execute()
and use it as a parameter in the format!()
macro with debug printing. In this example, execute
is already defined, but this would be the code we expect the user to implement.
Next, we get the pointer to the address of the resulting string, and also it’s length. Then we std::mem::forget()
the string. Woops!? Don’t worry. We’re just doing that to avoid memory clean-up, when our string goes out of scope at the end of the function.
Now we box a slice containing pointer and length and return the pointer to that memory.
All of this is necessary, since we can’t return the string directly (as it’s not a Wasm type). But a String
is just a pointer and a length. With that info, we can retrieve it’s content later. This is were
the missing multi-value return support really stung. We can’t return both the pointer and the length directly, so we need to go the extra step of packing them into a slice. We can then read that slice from the Rust side
of things and use the two values inside to get our string. I’ll show how in a bit.
Note that the code in the templates scaffold.rs
will be overwritten by the input code when the temporary directory is created. The demo below uses it to get the initial
state of the code editor.
Another important part of these template projects is, that they are compiled to dynamic system libraries. Otherwise you’ll need a main
function
and it’s just in general not what we’re looking for. So the cargo.toml's
of the templates include this:
[lib]
crate-type = ["cdylib"]
Wasmtime
Now that we have a compiled .wasm
file, we can run it. For that, we need a runtime. I’m using Wasmtime
here.
First, we need to get some boilerplate out of the way. The Wasm module is specified using the
run
function exported from the instance and store it’s return value in ptr
.
The memory and our ptr
result are passed to resolve_string
, which will retrieve our string data from the memory. We can also remove the files generated by cargo
.
For now, we only return the result string in our response struct.
fn run_wasm(file_name: &str, payload: &CodeSubmission) -> Result<ExecutionResult, anyhow::Error> {
let engine = Engine::new(wasmtime::Config::new())?;
let mut linker = Linker::new(&engine);
wasi_common::sync::add_to_linker(&mut linker, |s| s)?; // include necessary imports
let wasi = WasiCtxBuilder::new().build();
let mut store = Store::new(&engine, wasi);
let module = Module::from_file(
&engine,
format!("target/wasm32-wasip1/release/{file_name}.wasm"),
)?;
let instance = linker.instantiate(&mut store, &module)?;
let memory: Memory = instance.get_memory(&mut store, "memory").unwrap();
let run = instance.get_typed_func::<(), i32>(&mut store, "run")?;
let ptr = run.call(&mut store, ())? as usize;
let result = resolve_string(memory.data(&store), ptr)?;
fs::remove_file(format!("target/wasm32-wasip1/release/{file_name}.wasm"))?;
fs::remove_file(format!("target/wasm32-wasip1/release/{file_name}.d"))?;
Ok(ExecutionResult {
log: None,
out: Some(result),
})
}
Since our run
wrapper function returns a pointer to the Wasm memory where our pair of pointer and length is written, we now need convert that information back into the actual string:
fn resolve_string(memory: &[u8], ptr: usize) -> Result<String, anyhow::Error> {
let slice = &memory[ptr..ptr + 8]; // slice 8 byte from the wasm memory
let (ptr, len) = s.split_at(4); // split them into 4 + 4
// convert them into two i32, the pointer to our string and it's length
let (str_ptr, length) = (
i32::from_ne_bytes(ptr.try_into()?) as usize,
i32::from_ne_bytes(len.try_into()?) as usize,
);
let string_bytes = &memory[str_ptr..str_ptr + length]; // raw bytes composing our string
Ok(String::from_utf8(string_bytes.to_vec())?) // parse back into string and return
}
Handling raw memory never feels great, but it should be pretty safe. We’re using the pointer received from calling run()
to slice an 8 byte chunk out of the linear Wasm memory.
These 8 bytes hold the data of the pointer and length that we put into the boxed slice.
Then, we split it into two parts of 4 bytes each, parse those into i32
and cast them to usize
(since we need to use them for indexing).
We can then use those to slice into the Wasm memory once again, retrieve the bytes comprising the string, and convert them using String::from_utf8()
.
Arbitrary Data
So far, so good. We can now read string data from our Wasm memory. But it would be great to not be limited to that. What if we want to return a complex custom struct?
Hmm … shipping structured data between components that speak different languages … and we’re already able to return stringified data?
Let’s adjust the code to utilize serde_json
for serializing and deserializing our data.
We’re just changing the type of out
to serde_json::Value
. Then we deserialize the result string into that type.
#[derive(Serialize)]
pub struct ExecutionResult {
log: Option<String>,
out: serde_json::Value,
}
fn run_wasm(file_name: &str, payload: &CodeSubmission) -> Result<ExecutionResult, anyhow::Error> {
// ...
let result = resolve_string(memory.data(&store), ptr)?;
let deserialized: serde_json::Value = serde_json::from_str(&result)?;
Ok(ExecutionResult {
log: None,
out: deserialized,
})
}
We should also adjust our run
function in the template. Instead of using the format!
macro to convert the result of the executed code,
let’s serialize it using serde_json
.
#[no_mangle]
pub extern "C" fn run() -> *const i32 {
let message = serde_json::to_string(&execute()).unwrap();
// ...
}
And with that, we’re able to ferry any struct that implements Serialize
across the Wasm | Rust boundary. Now, when wanting to return a custom struct,
it just needs to #[derive(Serialize)]
.
stdout
We’re already returning a lot of useful information, e.g. error logs of compilation issues. But what’s happening inside the Wasm instance is sort of a black box for now. The ability to use print debugging to check what’s wrong, when the code returns successfully but the result is wrong, would be great.
So what we need is a way to capture the standard output of our Wasm instance somehow. By default, stdout
and stderr
eat all input and it doesn’t go anywhere.
We could use inherit_stdout()
when building our WasiCtx
, which would configure the Wasm stdout stream to write to the host process’s stdout.
But then we would still need to capture that in some way.
Instead, we’re creating a dynamically sized buffer, which we use as the base of a WritePipe
that is passed into the WASI context as target for the stdout.
The buffer needs to be wrapped in Arc<RwLock<>>
since it’s a shared resource and we need to avoid access conflicts.
After running the code, we can then simply read the buffer back into a string and return it as the log
of a successful execution.
fn run_wasm(file_name: &str, payload: &CodeSubmission) -> Result<ExecutionResult, anyhow::Error> {
// ...
let stdout_buffer = Arc::new(RwLock::new(Vec::new()));
let stdout_pipe = WritePipe::from_shared(stdout_buffer.clone());
let wasi = WasiCtxBuilder::new()
.stdout(Box::new(stdout_pipe))
.build();
// ...
let read = stdout_buffer.read().unwrap();
let output = String::from_utf8_lossy(&read);
let stdout = (!output.is_empty()).then(|| output.to_string());
// ...
Ok(ExecutionResult {
log: stdout,
out: deserialized,
})
}
Parameter
By now, we’re pretty comfortable with returning data from Wasm back to our host process. How about getting something in, though?
For that, we’ll once again have to adjust our code a bit. We need to check if the selected function is taking a parameter,
because those will require a different type signature for get_typed_func
. In our example, that’s the case for Function::Param
.
The process by which we get our parameter into the function is very similar to getting a return value out.
We write the parameter (String as a byte array
) into the instance memory at a specified offset
(here 0, so the beginning of the linear memory),
which will be the pointer address inside the Wasm code. Multiple parameters are supported, as opposed to multiple returns, so we don’t need to
take the extra step with the [pointer, length]
slice.
fn run_wasm(file_name: &str, payload: &CodeSubmission) -> Result<ExecutionResult, anyhow::Error> {
// ...
let ptr;
if let Function::Param = payload.function {
let run = instance.get_typed_func::<(i32, i32), i32>(&mut store, "run")?;
let offset = 0;
let length = payload.user_input.len();
memory.write(&mut store, offset, payload.user_input.as_bytes())?;
ptr = run.call(&mut store, (offset as i32, length as i32))? as usize;
} else {
let run = instance.get_typed_func::<(), i32>(&mut store, "run")?;
ptr = run.call(&mut store, ())? as usize;
}
// ...
}
And in our template project we simply read the raw memory at the offset
back into a String
of the given length
.
pub extern "C" fn run(ptr: *const u8, length: i32) -> *const i32 {
let bytes = unsafe { std::slice::from_raw_parts(ptr, length as usize)};
let input = String::from_utf8(bytes.to_vec()).unwrap();
// ...
}
Since this example is using the same field for both code and input parameter, we don’t write the input to the temporary directory for Function::Param
. But of course
you can adjust this so that a function supports both input parameter and input code.
async fn compile_and_run_wasm(payload: &CodeSubmission) -> Result<ExecutionResult, anyhow::Error> {
// ...
if !matches!(payload.function, Function::Param) {
write_file(&dst_dir.join("src/scaffold.rs"), &payload.user_input)?
}
// ...
}
Circuit Breaker
There’s one last thing we need to do before we can play with the demo. As it stands, passing in code that contains e.g. an infinite loop {}
would render
the program unresponsive. We could use something like tokio, run Wasmtime in a task
and implement a timeout, but Wasmtime provides
another avenue.
The engine can be configured to consume fuel
. Most instructions (not all) consume that fuel and when our tank is empty, the runtime aborts the process.
This involves just two small code changes:
fn run_wasm(file_name: &str, payload: &CodeSubmission) -> Result<ExecutionResult, anyhow::Error> {
let engine = Engine::new(wasmtime::Config::new().consume_fuel(true))?;
// ...
let mut store = Store::new(&engine, wasi);
store.set_fuel(500_000)?;
// ...
}
That’s it. We tell the engine to consume_fuel
and load up the tank. When the runtime aborts, a not-so-readable error will be returned.
Since that’s not very useful, we’ll check for that specific error (Trap::OutOfFuel
) and adjust the error log in that case.
pub fn main() {
// ...
let res = match compile_and_run_wasm(&payload) {
Ok(result) => result,
Err(err) => {
if let Some(oof) = err.downcast_ref::<Trap>() {
if matches!(oof, Trap::OutOfFuel) {
return ExecutionResult {
log: Some(String::from(
"Instruction maximum exceeded. Aborted execution to avoid DOS.",
)),
out: serde_json::Value::Null,
};
}
}
ExecutionResult {
log: Some(format!("Error: {}", err)),
out: serde_json::Value::Null,
}
}
};
// do something with the result
}
And now we’re done!
Demo
Let’s see it in action. You can select one of the available functions from the dropdown. The appropriate scaffold should be shown in the code editor below.
You’re encouraged to increase the limit
in the Prime Number example. At some point, this should trigger our fuel mechanic and abort execution.