The Chaos Controller provides a controller for chaos testing in Kubernetes and supports a rich set of supported failure scenarios. It relies in Linux and Docker functionality to inject network partitions and stress nodes.
A Helm chart is provided for setting up the controller. To deploy the
controller use helm install helm
from the project root:
helm install helm
When the chart is installed, following custom resources will be added to the cluster:
ChaosMonkey
Crash
NetworkPartition
Stress
The ChaosMonkey
resource is the primary resource provided by the controller.
The remaining custom resources are used by the controller to inject specific
failures into pods.
The chart supports overrides for both the controller and
workers. The controller is deployed as a Deployment
, and
the workers as a DaemonSet
.
Before running the controller, register the custom resources:
$ kubectl create -f deploy/chaosmonkey.yaml
$ kubectl create -f deploy/crash.yaml
$ kubectl create -f deploy/networkpartition.yaml
$ kubectl create -f deploy/stress.yaml
Setup RBAC and deploy the controller:
$ kubectl create -f deploy/service_account.yaml
$ kubectl create -f deploy/role.yaml
$ kubectl create -f deploy/role_binding.yaml
$ kubectl create -f deploy/controller.yaml
Deploy the workers:
$ kubectl create -f deploy/workers.yaml
Example ChaosMonkey
resources can be found in the example
directory:
$ kubectl create -f example/crash_monkey.yaml
$ kubectl create -f example/partition_monkey.yaml
$ kubectl create -f example/stress_monkey.yaml
The chaos controller provides a full suite of tools for chaos testing, injecting a variety of failures into the nodes and in the k8s pods and networks. Each monkey plays a specific role in injecting failures into the cluster:
apiVersion: chaos.atomix.io/v1alpha1
kind: ChaosMonkey
metadata:
name: crash-monkey
spec:
rateSeconds: 60
jitter: .5
crash:
crashStrategy:
type: Container
The scheduling of periodic ChaosMonkey
executions can be managed by providing a
rate and period for which the fault occurs:
rateSeconds
- the number of seconds to wait between monkey runsperiodSeconds
- the number of seconds for which to run a monkey, e.g. the amount of time for which to partition the network or stress a nodejitter
- the amount of jitter to apply to the rate
Specific sets of pods can be selected using pod names, labels, or match expressions
specified in the configured selector
:
matchPods
- a list of pod names on which to matchmatchLabels
- a map of label names and values on which to match podsmatchExpressions
- label match expressions on which to match pods
Selector options can be added on a per-monkey basis:
apiVersion: chaos.atomix.io/v1alpha1
kind: ChaosMonkey
metadata:
name: crash-monkey
spec:
crash:
crashStrategy:
type: Pod
selector:
matchPods:
- pod-1
- pod-2
- pod-3
matchLabels:
group: raft
matchExpressions:
- key: group
operator: In
values:
- raft
- data
Each monkey type has a custom configuration provided by a named field for the monkey type:
The crash monkey can be used to inject node crashes into the cluster. To configure a
crash monkey, use the crash
configuration:
apiVersion: chaos.atomix.io/v1alpha1
kind: ChaosMonkey
metadata:
name: crash-monkey
spec:
rateSeconds: 60
jitter: .5
crash:
crashStrategy:
type: Container
The crash
configuration supports a crashStrategy
with the following options:
Container
- kills the process running inside the containerPod
- deletes thePod
using the Kubernetes API
The partition monkey can be used to cut off network communication between a set of
pods. To use the partition monkey, selected pods must have iptables
installed.
To configure a partition monkey, use the partition
configuration:
apiVersion: chaos.atomix.io/v1alpha1
kind: ChaosMonkey
metadata:
name: partition-isolate-monkey
spec:
rateSeconds: 600
periodSeconds: 120
partition:
partitionStrategy:
type: Isolate
The partition
configuration supports a partitionStrategy
with the following options:
Isolate
- isolates a single random node in the cluster from all other nodesHalves
- splits the cluster into two halvesBridge
- splits the cluster into two halves with a single bridge node able to communicate with each half (for testing consensus)
The stress monkey uses a variety of tools to simulate stress on nodes and on the
network. To configure a stress monkey, use the stress
configuration:
apiVersion: chaos.atomix.io/v1alpha1
kind: ChaosMonkey
metadata:
name: stress-cpu-monkey
spec:
rateSeconds: 300
periodSeconds: 300
stress:
stressStrategy:
type: All
cpu:
workers: 2
The stress
configuration supports a stressStrategy
with the following options:
Random
- applies stress options to a random podAll
- applies stress options to all pods in the cluster
The stress monkey supports a variety of types of stress using the stress tool:
cpu
- spawnscpu.workers
workers spinning onsqrt()
io
- spawnsio.workers
workers spinning onsync()
memory
- spawnsmemory.workers
workers spinning onmalloc()
/free()
hdd
- spawnshdd.workers
workers spinning onwrite()
/unlink()
apiVersion: chaos.atomix.io/v1alpha1
kind: ChaosMonkey
metadata:
name: stress-all-monkey
spec:
rateSeconds: 300
periodSeconds: 300
stress:
stressStrategy:
type: Random
cpu:
workers: 2
io:
workers: 2
memory:
workers: 4
hdd:
workers: 1
Additionally, network latency can be injected using the stress monkey via
traffic control by providing
a network
stress configuration:
latencyMilliseconds
- the amount of latency to inject in millisecondsjitter
- the jitter to apply to the latencycorrelation
- the correlation to apply to the latencydistribution
- the delay distribution, eithernormal
,pareto
, orparetonormal
apiVersion: chaos.atomix.io/v1alpha1
kind: ChaosMonkey
metadata:
name: stress-network-monkey
spec:
rateSeconds: 300
periodSeconds: 60
stress:
stressStrategy:
type: All
network:
latencyMilliseconds: 500
jitter: .5
correlation: .25
The controller consists of two independent components which run as containers in a k8s cluster.
The controller is the component responsible for monitoring the creation/deletion of
ChaosMonkey
resources, scheduling executions, and distributing tasks to workers.
The controller typically runs as a Deployment
. When multiple replicas are run, only a
single replica will control the cluster at any given time.
When ChaosMonkey
resources are created in the k8s cluster, the controller receives a
notification and, in response, schedules a periodic background task to execute the monkey
handler. The periodic task is configured based on the monkey configuration. When a monkey
handler is executed, the controller filters pods using the monkey's configured selectors
and passes the pods to the handler for execution. Monkey handlers then assign tasks to
specific workers to carry out the specified chaos function.
The crash controller assigns tasks to workers via Crash
resources. The Crash
resource
will indicate the podName
of the pod to crash and the crashStrategy
with which to crash
the pod.
The partition controller assigns tasks to workers according to the configured
partitionStrategy
and uses the NetworkPartition
resource to communicate details of the
network partition to the workers. After determining the set of routes to cut off between pods,
the controller creates a NetworkPartition
for each source/destination pair.
The stress controller assigns tasks to workers via Stress
resources. The Stress
resource
will indicate the podName
of the pod to stress and the mechanisms with which to stress the
pod.
Workers are the components responsible for injecting failures on specific k8s nodes.
Like the controller, workers are a type of resource controller, but rather
than managing the high level ChaosMonkey
resources used to randomly inject failures,
workers provide for the injection of pod-level failures in response to the creation of
resources like Crash
, NetworkPartition
, or Stress
.
In order to ensure a worker is assigned to each node and can inject failures into the OS,
workers must be run in a DaemonSet
and granted privileged
access to k8s nodes.
Crash workers monitor k8s for the creation of Crash
resources. When a Crash
resource is
detected, if the podName
contained in the Crash
belongs to the node on which the worker
is running, the worker executes the crash. This ensures only one node attempts to execute
a crash regardless of the method by which the crash is performed.
The method of execution of the crash depends on the configured crashStrategy
. If the Pod
strategy is used, the worker simply deletes the pod via the Kubernetes API. If the Container
strategy is indicates, the worker locates the pod's container(s) via the Docker API and
kills the containers directly.
Partition workers monitor k8s for the creation of NetworkPartition
resources. Each
NetworkPartition
represents a link between two pods to be cut off while the resource is
running. The worker configures the pod indicated by podName
to drop packets from the
configured sourceName
. NetworkPartition
resources may be in one of four phases:
started
indicates the resource has been created but the pods are not yet partitionedrunning
indicates the pod has been partitionedstopped
indicates the partition has been stopped but the physical communication has not yet been restoredcomplete
indicates communication between the pod and the source has been restored
When a worker receives notification of a NetworkPartition
in the started
phase, if the
pod is running on the worker's node, the worker cuts off communication between the pod and
the source by locating the virtual network interface for the pod and adding firewall rules
to the host to drop packets received on the pod's virtual interface from the source IP.
To restore communication with the source, the worker simply deletes the added firewall rules.
Stress workers monitor k8s for the creation of Stress
resources. When a Stress
resource
is detected, the worker on the node to which the stressed pod is assigned may perform several
tasks to stress the desired pod.
For I/O, CPU, memory, and HDD stress, the worker will create a container in the pod's
namespace to execute the stress tool. For each
configured stress option, a separate container will be created to stress the pod.
When the Stress
resource is stopped
, all stress containers will be stopped.
For network stress, the host is configured using the traffic control
utility to control the pod's virtual interfaces. When the Stress
resource is stopped
,
the traffic control rule is deleted.