Skip to content

Commit

Permalink
Merge pull request systemd#29783 from CodethinkLabs/vmspawn/notify-so…
Browse files Browse the repository at this point in the history
…cket-forward-pr

vmspawn - forward messages to notify socket forward
  • Loading branch information
bluca authored Nov 9, 2023
2 parents 52c7727 + 6b30cad commit 4287498
Show file tree
Hide file tree
Showing 5 changed files with 572 additions and 72 deletions.
86 changes: 64 additions & 22 deletions man/systemd-vmspawn.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
<para><command>systemd-vmspawn</command> may be used to start a virtual machine from an OS image. In many ways it is similar to <citerefentry
project='man-pages'><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry>, but it
launches a full virtual machine instead of using namespaces.</para>

<para>Note: on Ubuntu/Debian derivatives systemd-vmspawn requires the user to be in the <literal>kvm</literal> group to use the VSock options.</para>
</refsect1>

<refsect1>
Expand Down Expand Up @@ -83,37 +85,77 @@
</listitem>
</varlistentry>

<varlistentry>
<term><option>--qemu-kvm=</option></term>
<varlistentry>
<term><option>--qemu-kvm=</option><replaceable>BOOL</replaceable></term>

<listitem><para>Configures whether to use KVM. If the option is not specified KVM support will be
detected automatically. If true, KVM is always used, and if false, KVM is never used.</para>
<listitem><para>Configures whether to use KVM. If the option is not specified KVM support will be
detected automatically. If true, KVM is always used, and if false, KVM is never used.</para>

<xi:include href="version-info.xml" xpointer="v255"/></listitem>
</varlistentry>
<xi:include href="version-info.xml" xpointer="v255"/></listitem>
</varlistentry>

<varlistentry>
<term><option>--qemu-gui</option></term>
<varlistentry>
<term><option>--qemu-vsock=</option><replaceable>BOOL</replaceable></term>

<listitem><para>Start QEMU in graphical mode.</para>
<listitem>
<para>Configure whether to use VSock networking.</para>
<para>If the option is not specified VSock support will be detected automatically.
If yes is specified VSocks are always used, and vice versa if no is set VSocks are never used.</para>
<xi:include href="version-info.xml" xpointer="v256"/>
</listitem>
</varlistentry>

<xi:include href="version-info.xml" xpointer="v255"/></listitem>
</varlistentry>
<varlistentry>
<term><option>--vsock-cid=</option><replaceable>CID</replaceable></term>

<varlistentry>
<term><option>--secboot=</option></term>
<listitem>
<para>Configure vmspawn to use a specific CID for the guest.</para>
<para>If the option is not specified or an empty argument is supplied the guest will be assigned a random CID.</para>
<para>Valid CIDs are in the range <constant>3</constant> to <constant>4294967294</constant> (<constant>0xFFFF_FFFE</constant>).
CIDs outside of this range are reserved.</para>
<xi:include href="version-info.xml" xpointer="v256"/>
</listitem>
</varlistentry>

<listitem><para>Configures whether to search for firmware which supports secure boot. If the option
is not specified, the first firmware which is detected will be used. If true, then the first
firmware with secure boot support will be selected. If false, then the first firmware without
secure boot will be selected.</para>
<varlistentry>
<term><option>--qemu-gui</option></term>

<xi:include href="version-info.xml" xpointer="v255"/></listitem>
<listitem><para>Start QEMU in graphical mode.</para>

<xi:include href="version-info.xml" xpointer="v255"/></listitem>
</varlistentry>

<varlistentry>
<term><option>--secure-boot=</option><replaceable>BOOL</replaceable></term>

<listitem><para>Configure whether to search for firmware which supports Secure Boot.</para>
<para>If the option is not specified the first firmware which is detected will be used.
If the option is set to yes then the first firmware with Secure Boot support will be selected.
If no is specified then the first firmware without Secure Boot will be selected.</para>

<xi:include href="version-info.xml" xpointer="v255"/></listitem>
</varlistentry>
</variablelist>

</refsect2><refsect2>
<title>System Identity Options</title>

<variablelist>
<varlistentry>
<term><option>-M</option></term>
<term><option>--machine=</option></term>

<listitem><para>Sets the machine name for this container. This
name may be used to identify this container during its runtime
(for example in tools like
<citerefentry><refentrytitle>machinectl</refentrytitle><manvolnum>1</manvolnum></citerefentry>
and similar).</para>
<xi:include href="version-info.xml" xpointer="v256"/>
</listitem>
</varlistentry>
</variablelist>

</refsect2>
<refsect2>
</refsect2><refsect2>
<title>Credentials</title>

<variablelist>
Expand All @@ -135,8 +177,7 @@
</varlistentry>
</variablelist>

</refsect2>
<refsect2>
</refsect2><refsect2>
<title>Other</title>

<variablelist>
Expand Down Expand Up @@ -166,6 +207,7 @@ $ systemd-vmspawn --image=image.raw
<title>Exit status</title>

<para>If an error occurred the value errno is propagated to the return code.
If EXIT_STATUS is supplied by the running image that is returned.
Otherwise EXIT_SUCCESS is returned.</para>
</refsect1>

Expand Down
1 change: 1 addition & 0 deletions man/version-info.xml
Original file line number Diff line number Diff line change
Expand Up @@ -77,4 +77,5 @@
<para id="v253">Added in version 253.</para>
<para id="v254">Added in version 254.</para>
<para id="v255">Added in version 255.</para>
<para id="v256">Added in version 256.</para>
</refsect1>
109 changes: 107 additions & 2 deletions src/vmspawn/vmspawn-util.c
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */

#include <stdio.h>
#include <unistd.h>
#include <linux/vhost.h>
#include <sys/ioctl.h>

#include "alloc-util.h"
#include "architecture.h"
#include "conf-files.h"
#include "errno-util.h"
Expand All @@ -15,7 +15,10 @@
#include "memory-util.h"
#include "path-lookup.h"
#include "path-util.h"
#include "random-util.h"
#include "recurse-dir.h"
#include "siphash24.h"
#include "socket-util.h"
#include "sort-util.h"
#include "string-util.h"
#include "strv.h"
Expand Down Expand Up @@ -45,6 +48,32 @@ int qemu_check_kvm_support(void) {
return -errno;
}

int qemu_check_vsock_support(void) {
_cleanup_close_ int fd = -EBADF;
/* Just using access() will just check if the device node exists, but not whether a
* device driver is behind it (this is a common case since systemd-tmpfiles creates
* the device node on boot, typically).
*
* Hence we open() the path to see if there's actually something behind.
*
* If not this should return ENODEV.
*/

fd = open("/dev/vhost-vsock", O_RDWR|O_CLOEXEC);
if (fd >= 0)
return true;
if (errno == ENODEV) {
log_debug_errno(errno, "/dev/vhost-vsock device doesn't exist. Not adding a vsock device to the virtual machine.");
return false;
}
if (errno == EPERM) {
log_debug_errno(errno, "Permission denied to access /dev/vhost-vsock. Not adding a vsock device to the virtual machine.");
return false;
}

return -errno;
}

/* holds the data retrieved from the QEMU firmware interop JSON data */
typedef struct FirmwareData {
char **features;
Expand Down Expand Up @@ -237,3 +266,79 @@ int find_qemu_binary(char **ret_qemu_binary) {

return find_executable(qemu_arch_specific, ret_qemu_binary);
}

int vsock_fix_child_cid(unsigned *machine_cid, const char *machine, int *ret_child_sock) {
/* this is an arbitrary value picked from /dev/urandom */
static const uint8_t sip_key[HASH_KEY_SIZE] = {
0x03, 0xad, 0xf0, 0xa4,
0x59, 0x2c, 0x77, 0x11,
0xda, 0x39, 0x0c, 0xba,
0xf5, 0x4c, 0x80, 0x52
};
struct siphash machine_hash_state, state;
_cleanup_close_ int vfd = -EBADF;
int r;

/* uint64_t is required here for the ioctl call, but valid CIDs are only 32 bits */
uint64_t cid = *ASSERT_PTR(machine_cid);

assert(machine);
assert(ret_child_sock);

/* Fix the CID of the AF_VSOCK socket passed to qemu
*
* If the user has passed us a CID (machine_cid != VMADDR_CID_ANY), then attempt to bind to that CID
* and error if we cannot.
*
* Otherwise hash the machine name to get a random CID and attempt to bind to that.
* If it is occupied add more information into the hash and try again.
* If after 64 attempts this hasn't worked fallback to truly random CIDs.
* If after another 64 attempts this hasn't worked then give up and return EADDRNOTAVAIL.
*/

/* remove O_CLOEXEC before this fd is passed to QEMU */
vfd = open("/dev/vhost-vsock", O_RDWR|O_CLOEXEC);
if (vfd < 0)
return log_debug_errno(errno, "Failed to open /dev/vhost-vsock as read/write: %m");

if (cid != VMADDR_CID_ANY) {
r = ioctl(vfd, VHOST_VSOCK_SET_GUEST_CID, &cid);
if (r < 0)
return log_debug_errno(errno, "Failed to set CID for child vsock with user provided CID %" PRIu64 ": %m", cid);
*ret_child_sock = TAKE_FD(vfd);
return 0;
}

siphash24_init(&machine_hash_state, sip_key);
siphash24_compress_string(machine, &machine_hash_state);
for (unsigned i = 0; i < 64; i++) {
state = machine_hash_state;
siphash24_compress_safe(&i, sizeof i, &state);
uint64_t hash = siphash24_finalize(&state);

cid = 3 + (hash % (UINT_MAX - 4));
r = ioctl(vfd, VHOST_VSOCK_SET_GUEST_CID, &cid);
if (r >= 0) {
*machine_cid = cid;
*ret_child_sock = TAKE_FD(vfd);
return 0;
}
if (errno != EADDRINUSE)
return -errno;
}

for (unsigned i = 0; i < 64; i++) {
cid = 3 + random_u64_range(UINT_MAX - 4);
r = ioctl(vfd, VHOST_VSOCK_SET_GUEST_CID, &cid);
if (r >= 0) {
*machine_cid = cid;
*ret_child_sock = TAKE_FD(vfd);
return 0;
}

if (errno != EADDRINUSE)
return -errno;
}

return log_debug_errno(SYNTHETIC_ERRNO(EADDRNOTAVAIL), "Failed to assign a CID to the guest vsock");
}
2 changes: 2 additions & 0 deletions src/vmspawn/vmspawn-util.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,7 @@ OvmfConfig* ovmf_config_free(OvmfConfig *ovmf_config);
DEFINE_TRIVIAL_CLEANUP_FUNC(OvmfConfig*, ovmf_config_free);

int qemu_check_kvm_support(void);
int qemu_check_vsock_support(void);
int find_ovmf_config(int search_sb, OvmfConfig **ret_ovmf_config);
int find_qemu_binary(char **ret_qemu_binary);
int vsock_fix_child_cid(unsigned *machine_cid, const char *machine, int *ret_child_sock);
Loading

0 comments on commit 4287498

Please sign in to comment.