Skip to content

Commit

Permalink
merge: #2969
Browse files Browse the repository at this point in the history
2969: [wip] firecracker spawn variant r=fnichol a=johnrwatson

Adds a new function execution platform variant of (`LocalUdsRuntimeStrategy` named `LocalFirecracker`) which uses Firecracker Micro-VMs to execute the functions within.

This PR also includes conditional compiling of the VSOCK elements which are not supported on Mac `M` series chips and some others. This allows the default compilation execution strategy (LocalProcess) to be used on `all` hosts compatible with SI, not just those which are compatible with the `firecracker specific` elements on the build. It's important to note that due to this, binaries produced on a host which does not support firecracker, will not be able to execute on firecracker even on another host that `does` support it. 

Currently the Firecracker host must be `local` to the context of the rest of the stack. I.e. this PR does not add "remote execution host" functionality to the SI stack although adding this should be possible leveraging a router or similar execution-host manager. Additionally, because of the above and the nature of how firecracker has to interact at a privileged level, to run the stack under the state of this PR, the `buck2 run dev` needs to run as `root` to allow it to take the relevant actions.

<hr />

This change will be dead code, unless the values for  these attributes are set:

Enabling in Cyclone Server:
`lib/deadpool-cyclone/src/instance/cyclone/local_uds.rs`
```
impl Default for LocalUdsRuntimeStrategy {
    fn default() -> Self {
        Self::LocalFirecracker                           // set from Self::LocalProcess
    }
}
```

Enabling in Cyclone Client:
`lib/cyclone-client/src/client.rs`
```
impl Default for ClientConfig {
    fn default() -> Self {
        Self {
            firecracker_connect: true,             // firecracker-setup: set from false
            watch_timeout: Duration::from_secs(10),
        }
    }
}
```

<hr/>

**Kernel and Rootfs Generation:**
- Producing the kernel image at this time is as follows:
  - Clone the kernel repo https://github.com/torvalds/linux on the relevant/chosen git tag
  - Build the kernel image using `make vmlinux` / `make Image` depending on platform, drive the installation with a supported kernel configuration from [here](https://github.com/firecracker-microvm/firecracker/tree/main/resources/guest_configs)
  
 - Producing the Rootfs at this time is as follows:
   - Clone SI repo
   - Run `buck2 run bin/cyclone:image` to generate the cyclone image (This puts it into the docker daemon on the host)
   - Use the outputted cyclone image reference into the `bin/cyclone/create-firecracker-root-fs.sh` script in the `docker run stage` and execute the script. This outputs the new rootfs into the `/firecracker-data/` folder on the host.

We will migrate the production of both of these artifacts into CI at some point in the future.

<hr/>

**Execution Environment**
- Each Firecracker micro-vm is spawned via the jailer binary, a binary which significantly improves the security posture of the host, find more information on exactly what it does here [jailer docs]( https://github.com/firecracker-microvm/firecracker/blob/main/docs/jailer.md)
- Each micro-vm has it's own network namespace and two routes: 
  - One out to the internet via the external host interface
  - One VSOCK interface to allow cyclone-server and cyclone-client (veritech) to talk privately via the virtio-vsock kernel module
- Each has it's own Linux user, group and root filesystem path on the host under which it has limited permissions. The path for these are set to the default for firecracker at the minute, namely: `/srv/jailer/firecracker/<id>/root/`
- Currently we don't pass any cgroups to the jailer, but this can be added for process-level cpu allocation isolation in the future.
- Each execution has it's own process namespace
<hr/>

**Validation on Host**
A test has been added (disabled) within `lib/deadpool-cyclone/src/lib.rs` called `chop` which validates that the firecracker execution environment on the host running the test is valid and works at a high level (can launch firecracker VM and establish a ping/pong websocket connection through to cyclone server).

**Prerequisites on Host**
- The host must have the SI repository and all prerequisites to compile and run the code. 
- The Firecracker VMM is built to be processor agnostic. 64-bit Intel, AMD and Arm CPUs with hardware virtualization support are generally available for production workloads. (i.e. does not support Mac `M` series chips)
- The host must have had the [si-firecracker-config](https://github.com/systeminit/si-firecracker-config/blob/main/machine-userdata.sh) host config executed against it. It's likely at some point in the near future this config will merge into the main `si` monorepo to keep the rust and required system configuration together.

**Further Notes**

- There are various TODO's left in the code which can be dealt with at a later point in time:
  - Hardcoded ProcessRuntime: This should be configurable or otherwise driven by external system/environment settings. 
  - Killall VMs script (currently we only kill one at a time on a host)
  - Observability deep dive 
    - We have no manner of debugging or seeing traces from cyclone-server running within firecracker.
    - How do we detect kernel issues (ooms, etc) - we have no real manner to detect or monitor for these
  - We are using a shared SSH private key on the host to connect to micro-vms. We will likely need to protect this route in more formally using tailscale or similar in the future as customer code will be accessible within that context.
  - Build/test pipeline - As mentioned in the creation of the kernel and rootfs section, we currently manually create both these artifacts and have limited/zero test coverage over them until we manually check their validity.
  - Shareable remote host - As mentioned in the pre-amble description, this PR only supports a single-tenanted SI Stack as we do not talk remotely to a firecracker-capable host to run the execution. 
  - Capacity planning - how many VMs per metal instance, how much memory do the VMs need, etc. These calculations and estimations will come as time passes.
  - No formal performance tests have been executed but the system seems suitably performant at this time.
  - Tying cyclone image build to rootfs generation. Currently this is a manual process whereby 
  - stop.sh is not getting triggered on oom or similar firecracker fault. I.e. the firecracker micro-vm API process is left intact/unterminated. From the UI/cyclone-server perspective you get a connection lost error.
  - The provisioning strategy for micro-vms does not support parallelism at this time. The processes will clash when manipulating the hosts iptables rules and cause the secondary one to fail. When we move to use parallel execution we will need to make mild adjustments here to the pooling/launching strategy from veritech.


Co-authored-by: Scott Prutton <[email protected]>
  • Loading branch information
si-bors-ng[bot] and sprutton1 authored Nov 27, 2023
2 parents a56061b + e8e210f commit b1d1f8a
Show file tree
Hide file tree
Showing 21 changed files with 642 additions and 49 deletions.
53 changes: 49 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ tokio-stream = "0.1.14"
tokio-test = "0.4.2"
tokio-tungstenite = "0.18.0"
tokio-util = { version = "0.7.8", features = ["codec"] }
tokio-vsock = { version = "0.4.0"}
toml = { version = "0.7.6" }
tower = "0.4.13"
tower-http = { version = "0.4.0", features = ["cors", "trace"] }
Expand Down
7 changes: 6 additions & 1 deletion bin/cyclone/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ RUN cp -R $(nix-store --query --requisites result/) /tmp/nix-store-closure
# hadolint ignore=SC2046
RUN ln -snf $(nix-store --query result/)/bin/* /tmp/local-bin/

# Add wrapper/entrypoint for `$BIN` to exec `.$BIN`
RUN set -eux; \
mv -v "/tmp/local-bin/$BIN" "/tmp/local-bin/.$BIN"; \
cp -pv /workdir/bin/$BIN/docker-entrypoint.sh "/tmp/local-bin/$BIN";

###########################################################################
# Builder Stage: lang-js
###########################################################################
Expand All @@ -47,7 +52,7 @@ RUN ln -snf $(nix-store --query result/)/bin/* /tmp/local-bin/
###########################################################################
# Final Stage
###########################################################################
FROM alpine:3 AS final
FROM alpine:3.18 AS final
ARG BIN=cyclone

# hadolint ignore=DL3018
Expand Down
88 changes: 88 additions & 0 deletions bin/cyclone/create-firecracker-root-fs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
#!/bin/bash

# vars
GITROOT="$(git rev-parse --show-toplevel)"
PACKAGEDIR="$GITROOT/cyclone-pkg"
ROOTFS="$PACKAGEDIR/cyclone-rootfs.ext4"
ROOTFSMOUNT="$PACKAGEDIR/rootfs"
GUESTDISK="/rootfs"
INITSCRIPT="$PACKAGEDIR/init.sh"

# create disk and mount to a known locations
sudo rm -rf $PACKAGEDIR
mkdir -p $ROOTFSMOUNT $KERNELMOUNT
dd if=/dev/zero of=$ROOTFS bs=1M count=1024
mkfs.ext4 $ROOTFS
sudo mount $ROOTFS $ROOTFSMOUNT

# create our script to add an init system to our container image
cat << EOL > $INITSCRIPT
apk update
apk add openrc openssh
ssh-keygen -A
# Make sure special file systems are mounted on boot:
rc-update add devfs boot
rc-update add procfs boot
rc-update add sysfs boot
rc-update add local default
rc-update add networking boot
rc-update add sshd
# Then, copy the newly configured system to the rootfs image:
for d in bin dev etc lib root sbin usr nix; do tar c "/\${d}" | tar x -C ${GUESTDISK}; done
for dir in proc run sys var; do mkdir ${GUESTDISK}/\${dir}; done
# autostart cyclone
cat << EOF > ${GUESTDISK}/etc/init.d/cyclone
#!/sbin/openrc-run
name="cyclone"
description="Cyclone"
supervisor="supervise-daemon"
command="cyclone"
command_args="--bind-vsock 3:52 --decryption-key /dev.decryption.key --lang-server /usr/local/bin/lang-js --enable-watch --limit-requests 1 --watch-timeout 10 --enable-ping --enable-resolver --enable-action-run"
pidfile="/run/agent.pid"
EOF
chmod +x ${GUESTDISK}/usr/local/bin/cyclone
chmod +x ${GUESTDISK}/usr/local/bin/lang-js
chmod +x ${GUESTDISK}/etc/init.d/cyclone
chroot ${GUESTDISK} rc-update add cyclone boot
# networking bits
echo "nameserver 8.8.8.8" > ${GUESTDISK}/etc/resolv.conf
cat << EOZ >${GUESTDISK}/etc/network/interfaces
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 10.0.0.1/30
gateway 10.0.0.2
EOZ
EOL

# run the script, mounting the disk so we can create a rootfs
sudo docker run \
-v $ROOTFSMOUNT:$GUESTDISK \
-v $INITSCRIPT:/init.sh \
-it --rm \
--entrypoint sh \
systeminit/cyclone:sha-ef6a35641b2cd8f07475ad5c7a46504883f0a6af-dirty-amd64 \
/init.sh

# lets go find the dev decryption key for now
sudo cp $GITROOT/lib/cyclone-server/src/dev.decryption.key $ROOTFSMOUNT

# cleanup the PACKAGEDIR
sudo umount $ROOTFSMOUNT
rm -rf $ROOTFSMOUNT $KERNELMOUNT $INITSCRIPT $KERNELISO

# move the package
sudo mv $PACKAGEDIR/cyclone-rootfs.ext4 /firecracker-data/rootfs.ext4

# cleanup
sudo rm -rf $PACKAGEDIR
16 changes: 16 additions & 0 deletions bin/cyclone/src/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ pub(crate) struct Args {
#[arg(long, group = "bind")]
pub(crate) bind_uds: Option<PathBuf>,

/// Binds service to a vsock socket [example: 3:52]
#[arg(long, group = "bind")]
pub(crate) bind_vsock: Option<String>,

/// Enables active/watch behavior.
#[arg(long, group = "watch")]
pub(crate) enable_watch: bool,
Expand Down Expand Up @@ -118,6 +122,18 @@ impl TryFrom<Args> for Config {
builder.incoming_stream(IncomingStream::UnixDomainSocket(pathbuf));
}

#[cfg(target_os = "linux")]
if let Some(addr) = args.bind_vsock {
// todo(scott): check the format before attempting to parse
let split = addr.split(':').collect::<Vec<&str>>();

let vsock_addr = cyclone_server::VsockAddr::new(
split[0].parse::<u32>().unwrap(),
split[1].parse::<u32>().unwrap(),
);
builder.incoming_stream(IncomingStream::VsockSocket(vsock_addr));
}

builder.try_lang_server_path(args.lang_server)?;

if args.enable_watch {
Expand Down
7 changes: 7 additions & 0 deletions bin/cyclone/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,13 @@ async fn run(args: args::Args, mut telemetry: ApplicationTelemetryClient) -> Res
.run()
.await?
}
#[cfg(target_os = "linux")]
IncomingStream::VsockSocket(_) => {
Server::vsock(config, telemetry, decryption_key)
.await?
.run()
.await?
}
}

Ok(())
Expand Down
34 changes: 30 additions & 4 deletions lib/cyclone-client/src/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ use hyper::{
use hyperlocal::{UnixClientExt, UnixConnector, UnixStream};
use thiserror::Error;
use tokio::{
io::{AsyncRead, AsyncWrite},
io::{AsyncRead, AsyncReadExt, AsyncWrite, AsyncWriteExt},
net::TcpStream,
};
use tokio_tungstenite::WebSocketStream;
Expand All @@ -50,6 +50,8 @@ pub enum ClientError {
InvalidUri(#[from] InvalidUri),
#[error("invalid websocket uri scheme: {0}")]
InvalidWebsocketScheme(String),
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
#[error("missing authority")]
MissingAuthority,
#[error("missing websocket scheme")]
Expand Down Expand Up @@ -228,7 +230,7 @@ where
Conn::Error: Into<Box<dyn std::error::Error + Send + Sync>>,
Conn::Future: Unpin + Send,
Strm: AsyncRead + AsyncWrite + Connection + Unpin + Send + Sync + 'static,
Sock: Send + Sync,
Sock: Send + Sync + std::fmt::Debug,
{
async fn watch(&mut self) -> Result<Watch<Strm>> {
let stream = self.websocket_stream("/watch").await?;
Expand Down Expand Up @@ -328,6 +330,7 @@ where
Conn::Error: Into<Box<dyn std::error::Error + Send + Sync>>,
Conn::Future: Unpin + Send,
Strm: AsyncRead + AsyncWrite + Connection + Unpin + Send + Sync + 'static,
Sock: Send + Sync + std::fmt::Debug,
{
fn http_request_uri<P>(&self, path_and_query: P) -> Result<Uri>
where
Expand Down Expand Up @@ -409,20 +412,40 @@ where
self.inner_client.request(req)
}

async fn connect(&mut self, mut stream: Strm) -> Result<Strm> {
let connect_cmd = format!("CONNECT {}\n", 52);
stream.write_all(connect_cmd.as_bytes()).await?;
// We need to read off the response to clear the stream
let mut connect_response = Vec::<u8>::new();
loop {
let mut single_byte = vec![0; 1];
stream.read_exact(&mut single_byte).await?;
connect_response.push(single_byte[0]);
if single_byte == [b'\n'] {
break;
}
}
Ok(stream)
}

async fn websocket_stream<P>(&mut self, path_and_query: P) -> Result<WebSocketStream<Strm>>
where
P: TryInto<PathAndQuery, Error = InvalidUri>,
{
let stream = self
let mut stream = self
.connector
.call(self.uri.clone())
.await
.map_err(|err| ClientError::Connect(err.into()))?;

if self.config.firecracker_connect {
stream = self.connect(stream).await?;
}

let uri = self.new_ws_request(path_and_query)?;
let (websocket_stream, response) = tokio_tungstenite::client_async(uri, stream)
.await
.map_err(ClientError::WebsocketConnection)?;

if response.status() != StatusCode::SWITCHING_PROTOCOLS {
return Err(ClientError::UnexpectedStatusCode(response.status()));
}
Expand All @@ -433,12 +456,15 @@ where

#[derive(Debug)]
struct ClientConfig {
firecracker_connect: bool,
watch_timeout: Duration,
}

impl Default for ClientConfig {
fn default() -> Self {
Self {
// firecracker-setup: change firecracker_connect to "true"
firecracker_connect: false,
watch_timeout: Duration::from_secs(10),
}
}
Expand Down
7 changes: 6 additions & 1 deletion lib/cyclone-server/BUCK
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,12 @@ rust_library(
"//third-party/rust:tokio-util",
"//third-party/rust:tower",
"//third-party/rust:tower-http",
],
] + select({
"DEFAULT": [],
"config//os:linux": [
"//third-party/rust:tokio-vsock",
],
}),
srcs = glob(["src/**/*.rs"]),
)

Expand Down
3 changes: 3 additions & 0 deletions lib/cyclone-server/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,6 @@ tokio-serde = { workspace = true }
tokio-util = { workspace = true }
tower = { workspace = true }
tower-http = { workspace = true }

[target.'cfg(target_os = "linux")'.dependencies]
tokio-vsock = { workspace = true }
Loading

0 comments on commit b1d1f8a

Please sign in to comment.