-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network-binding-plugin: add plugin for vhostuser interfaces. #294
Open
bgaussen
wants to merge
7
commits into
kubevirt:main
Choose a base branch
from
Orange-OpenSource:vhostuser-network-binding-plugin
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
b4a4d97
network-binding-plugin: add plugin for vhostuser interfaces.
bgaussen b3d7cb7
Fix typo, adding precision for device plugin generating a per pod id.
bgaussen 3d900ec
Fix typos.
bgaussen 7cbf677
Re-factor design proposal to focus on device plugin proposal
bgaussen 01dfcc6
Update design with device plugin implementation and requirement for d…
bgaussen ad1115b
Update design proposal with emptyDir shared dir instead of shareProce…
bgaussen d29a404
Fix some paths
bgaussen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file added
BIN
+214 KB
...tuser-binding-plugin/kubevirt-vhostuser-binding-plugin-device-plugin.drawio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
224 changes: 224 additions & 0 deletions
224
...-proposals/network-vhostuser-binding-plugin/network-vhostuser-binding-plugin.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,224 @@ | ||
# Overview | ||
`vhostuser` interfaces are supported by qemu but not implemented in Kubevirt. Network Binding Plugin is a good framework to add support for `vhostuser` interfaces to Kubervirt. | ||
|
||
## Motivation | ||
`vhostuser` interfaces are required to attach VMs to a userspace dataplane such as OVS-DPDK or VPP and achieve a fast datapath from the VM to the physical NIC. | ||
This is a mandatory feature for networking VMs such as vRouter, IPSEC gateways, firewall or SD-WAN VNFs, that usually bind the network interfaces using DPDK. Expected performance with DPDK can only be met if the whole datapath is userspace and not go through kernel interfaces like with usual bridge interfaces. | ||
|
||
## Goals | ||
Be able to add `vhostuser` secondary interfaces to the VM definition in Kubevirt. | ||
|
||
## Non Goals | ||
The `vhostuser` secondary interfaces configuration in the dataplane is under the responsibility of Multus and the CNI such as `userspace CNI`. | ||
|
||
## Definition of Users | ||
- **VM User** is the persona that configures `VirtualMachine` or `VirtualMachineInstance` | ||
- **Cluster Admin** is the persona that configures `KuberVirt` resources | ||
- **Network Binding Plugin Developer** is the persona that implements the `network-vhostuser-binding` plugin | ||
- **CNI Developer** is the persona that implements the CNI that configures the dataplane with vhostuser sockets | ||
- **Dataplane Developer** is the persona that implements the userspace dataplane | ||
|
||
## User Stories | ||
- As a VM User, I want to create a VM with one or serveral `vhostuser` interfaces attached to a userspace dataplane. | ||
- As a VM User, I want the `vhostuser` interface to be configured with a specific MAC address. | ||
- As a VM User, I want to enable multi-queue on the `vhostuser` interface | ||
- As a VM User, I want to be able to configure the `vhostuser` interface as transitional | ||
- As a Cluster Admin, I want to be able to enable `network-vhostuser-binding` | ||
- As a Network Binding Plugin Developer, I want the shared socket path to be accessible to `virt-launcher` pod | ||
- As a Dataplane Developer, I want to access all `vhostuser` sockets of VM pods | ||
- As a CNI Developer, I want to know whet vhostuser sockets are located | ||
|
||
## Repos | ||
Kubevirt repo, and most specificaly [cmd/sidecars](https://github.com/kubevirt/kubevirt/tree/main/cmd/sidecars). | ||
|
||
## Design | ||
This proposal leverages the KubeVirt Network Binding Plugin sidecar framework to implement a new `network-vhostuser-binding-plugin`. | ||
|
||
`network-vhostuser-binding-plugin` role is to implement the modification to the domain XML according to the VMI definition passed through its gRPC service by the `virt-launcher` pod on `OnDefineDomain` event from `virt-handler`. | ||
|
||
`vhostuser` interfaces are defined in the VMI under `spec/domain/devices/interfaces` using the binding name `vhostuser`: | ||
|
||
```yaml | ||
spec: | ||
domain: | ||
devices: | ||
networkInterfaceMultiqueue: true | ||
interfaces: | ||
- name: default | ||
masquerade: {} | ||
- name: net1 | ||
binding: | ||
name: vhostuser | ||
macAddress: ca:fe:ca:fe:42:42 | ||
``` | ||
|
||
`network-vhostuser-binding` translates the VMI definition into libvirt domain XML modifications on `OnDefineDomain`: | ||
1. Creates a new interface with `type='vhostuser'` | ||
2. Set the MAC address if specified in the VMI spec | ||
3. Define model type according to `useVirtioTransitional` VMI spec | ||
4. If `networkInterfaceMultiqueue` is set to `true`, add the number of queues calculated after the number of cores of the VMI | ||
5. Add `memAccess='shared'` to all NUMA cells elements | ||
6. Define the device name according to Kubevirt naming schema | ||
7. Define the `vhostuser` socket path, immutable accross Live Migration | ||
|
||
As `OnDefineDomain` hook can be called multiple times by KubeVirt, `network-vhostuser-binding` modification must be idempotent. | ||
|
||
Below is an example of modified domain XML: | ||
|
||
```xml | ||
<cpu mode="host-model"> | ||
<topology sockets="2" cores="8" threads="1"></topology> | ||
<numa> | ||
<cell id="0" cpus="0-7" memory="2097152" unit="KiB" memAccess="shared"/> | ||
<cell id="1" cpus="8-15" memory="2097152" unit="KiB" memAccess="shared"/> | ||
</numa> | ||
</cpu> | ||
<interface type='vhostuser'> | ||
<source type='unix' path='/var/run/kubevirt/vhostuser/net1/poda08a0fcbdea' mode='server'/> | ||
<target dev='poda08a0fcbdea'/> | ||
<model type='virtio-non-transitional'/> | ||
<mac address='ca:fe:ca:fe:42:42'/> | ||
<driver name='vhost' queues='8' rx_queue_size='1024' tx_queue_size='1024'/> | ||
<alias name='ua-net1'/> | ||
</interface> | ||
``` | ||
|
||
### Implementation details | ||
|
||
The socket path have to be available to both `virt-launcher` pod (and `compute` container) and dataplane pod. | ||
In order to not use hostPath volumes that requires pod to be privileged, we propose to implement a **vhostuser Device Plugin** that will be able to inject mounts to the sockets directory into unprivileged pods, and annotations. | ||
|
||
#### Device Plugin for **vhostuser sockets** resources | ||
|
||
Device plugins can instructs kubelet to add mounts into the containers when managed resources are requested. | ||
|
||
This design proposal relies on a device plugin that would manage two kinds of resources on the userspace dataplane that we can think of a virtual switch: | ||
- **dataplane**: `1` | ||
This resource give access to all sub directories of `/var/run/vhostuser`, and to sockets inside. | ||
It is requested by the dataplane itself. | ||
Kubelet injects `/var/run/vhostuser` mount in the container. | ||
- **vhostuser sockets**: `n` | ||
This resource can be thought as a virtual switch port, and can have a limit related to dataplane own limitation (performance, CPU, etc.). | ||
It can help schedule workloads on node where dataplane has available resources. | ||
It is requested through VM or VMI definition in resources request spec. In turn the `compute` container of the `virt-launcher` pod will request the same resources. | ||
This makes the device plugin allocates a sub directory `/var/run/vhostuser/<socketXX>`, and mount it into the `virt-launcher` pod. | ||
|
||
The device plugin has to comply with [`device-info-spec`](https://github.com/k8snetworkplumbingwg/device-info-spec/blob/main/SPEC.md#device-information-specification). This allows information sharing between device plugin and the CNI. Thanks to Multus being compliant with this spec, the CNI can retrieve device information (socket path and and type) to be used to configure the dataplane accordingly. Multus will annotate the `virt-launcher` pod with this information, KubeVirt extracts only a part into `kubevirt.io/network-info`. | ||
|
||
The device plugin has to care about directory permissions and SELinux, for the sockets to be accessible from requesting pods. | ||
|
||
#### Network Binding Plugin and Kubevirt requirements | ||
|
||
Network Binding Plugin then can leverage `downwardAPI` feature available from Kubevirt v1.3.0, in order to retrieve the `kubevirt.io/network-info` annotation values, and extract the socket path to configure the interface in the domain XML. | ||
|
||
But it can't use it directly as it would break Live Migration of VMs: | ||
The socket directories `/var/run/vhostuser/<socketXX>` are not predictable, and new ones get allocated when the destination pod is being created. | ||
Unfortunately the domain XML is the one from the source pod (migration domain), and references sockets paths allocated to source pod. | ||
|
||
Hence, Network Binding Plugin needs to use immutable paths to sockets. This can be achieved using the interface name (or its hash version) in symbolic links to the real socket path: `/var/run/kubevirt/vhostuser/net1` -> `/var/run/vhostuser/<socketXX>`. | ||
|
||
This requires an enhancement in KubeVirt, and Network Binding Plugin KubeVirt CRD spec, in order for `virt-launcher` pod to have a shared `emptyDir` volume, mounted in both `compute` and `vhostuser-network-binding-plugin` containers. | ||
|
||
#### Implementation diagram | ||
|
||
![kubevirt-vhostuser-shared-sockets](kubevirt-vhostuser-binding-plugin-device-plugin.drawio.png) | ||
|
||
## API Examples | ||
|
||
### KubeVirt CRD | ||
|
||
A new parameter for the shared directory must be defined in the Network Binding Plugin spec of the KubeVirt CR: | ||
|
||
```yaml | ||
apiVersion: kubevirt.io/v1 | ||
kind: KubeVirt | ||
spec: | ||
configuration: | ||
network: | ||
binding: | ||
vhostuser: | ||
sidecarImage: network-vhostuser-binding:main | ||
sharedDir: /var/run/kubevirt/vhostuser | ||
``` | ||
|
||
### No modification to VM | ||
|
||
Example of a `VirtualMachine` definition using `network-vhostuser-binding` plugin and device plugin resources requests: | ||
|
||
```yaml | ||
apiVersion: kubevirt.io/v1 | ||
kind: VirtualMachine | ||
metadata: | ||
name: vhostuser-vm | ||
namespace: tests | ||
spec: | ||
running: true | ||
template: | ||
metadata: | ||
labels: | ||
kubevirt.io/domain: vhostuser-vm | ||
spec: | ||
architecture: amd64 | ||
domain: | ||
cpu: | ||
cores: 4 | ||
devices: | ||
disks: | ||
- disk: | ||
bus: virtio | ||
name: containerdisk | ||
interfaces: | ||
- masquerade: {} | ||
name: default | ||
- binding: | ||
name: vhostuser | ||
macAddress: ca:fe:ca:fe:42:42 | ||
name: net1 | ||
networkInterfaceMultiqueue: true | ||
machine: | ||
type: q35 | ||
memory: | ||
hugepages: | ||
pageSize: 1Gi | ||
resources: | ||
limits: | ||
vhostuser/sockets: 1 | ||
requests: | ||
memory: 2Gi | ||
vhostuser/sockets: 1 | ||
networks: | ||
- name: default | ||
pod: {} | ||
- multus: | ||
networkName: vhostuser-network | ||
name: net1 | ||
nodeSelector: | ||
node-class: dpdk | ||
volumes: | ||
- containerDisk: | ||
image: os-container-disk-40g | ||
name: containerdisk | ||
``` | ||
|
||
## Scalability | ||
(overview of how the design scales) | ||
|
||
## Update/Rollback Compatibility | ||
Kubevirt Network Binding plugin relies on `hooks/v1alpha3` API for a clean termination of the `network-vhostuser-binding` container in the virt-launcher pod. | ||
|
||
## Functional Testing Approach | ||
Create a VM with several `vhostuser` interfaces then: | ||
- check the generated domain XML contains all interfaces with appropriate configuration | ||
- check the vhostuser sockets are created in the expected directory of virt-launcher pod | ||
- check the vhostuser sockets are available to the dataplane pod | ||
- check the VM is running | ||
- check VM network connectivity | ||
- live migrate the VM | ||
- check the VM is migrated and is running | ||
- check VM network connectivity | ||
|
||
# Implementation Phases | ||
- [ ] Implement network binding plugin sharedDir spec in KubeVirt | ||
- [x] First implementation of the `network-vhostuser-binding` | ||
- [x] Implement vhostuser device plugin, based on [generic-device-plugin](https://github.com/squat/generic-device-plugingeneric-device-plugin) | ||
- [ ] Upstream `network-vhostuser-binding` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which kind of networks the plugin is going to support? (pod network, secondary networks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plugin is designed to support secondary networks.