Skip to content

Commit

Permalink
Merge pull request #70 from clobrano/update-docs-with-out-of-service-…
Browse files Browse the repository at this point in the history
…taint-information-0

update docs with out-of-service taint information
  • Loading branch information
openshift-merge-bot[bot] authored May 23, 2024
2 parents 2658fa7 + 144a8c2 commit 8e774a2
Showing 1 changed file with 20 additions and 12 deletions.
32 changes: 20 additions & 12 deletions recover_affected_workloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,31 @@ nav_order: 8

In kubernetes world there is no concept of rescheduling. When pods are bounded to a
node they run till termination. In a case of hardware failure, or some other
resource problem that hang the OS, there is no built-in solution to migrate or
resource problem that freezes the OS, there is no built-in solution to migrate or
restart existing pods.

If the node misbehaves, and processes can not run anymore the control
plane will start counting down. It will either be the kubelet process which doesn't
report status to the control plane [(see node_status_update_frequency)][1] or
report status to the control plane (see [node_status_update_frequency][]) or
the containers themselves will fail/crash and be marked as `phase: failed`.
Nodes status conditions, set by the node-controller in the control plane, will
reflect the current status and how long the node has been in that state.
For example, in the case where the node fails to update its status to the
masters the node condition will be set to `type:Ready status:Unknown`.

In order to gain the workload back, reschedule it and start it all over, the
control plane needs to know that the running node is gone and the only way to achieve
that is by deleting the node object, which will trigger [scheduling the pods for
deletion][2]
In order to gain the workload back, reschedule it and start it all over, the control plane
needs to know that the the workload's pod is not running anymore. There are two
strategies to accomplish that:

The job of [self-node-remediation](/remediation/self-node-remediation/self-node-remediation/) is to reboot a failing host, delete its node object,
,and recreate it to restore the node to a working state.
- [`OutOfServiceTaint`][] - new core Kubernetes feature named [Non-Graceful Node
Shutdown], it triggers pods on the node to be forcefully deleted if there are no
matching tolerations on the pods. Persistent volumes attached to the shutdown node will
be detached, and new pods will be created on a different running node. This
is the recommended strategy.

- `ResourceDeletion` - it deletes the pods on the node. This strategy is still supported for
backward compatibility with clusters running Kubernetes versions that do not support Non-Graceful
Node Shutdown feature, but it is discouraged.

# Pod recovery flow

Expand All @@ -48,7 +54,8 @@ After a successful unpublishing the pod and the volume can start again, on a dif
node.

## Virtualized worker node
Nodes which are VMs should be configured with a watchdog device ([see libvirt's watchdog support][3])

Nodes which are VMs should be configured with a watchdog device (see [libvirt's watchdog support][])
In case it's not configured then self-node-remediation fallbacks to rebooting the machine
using 'systemctl reboot'. The benefits of a virtualized watchdog is that its running
on the hypervisor and is not vulnerable to resource starvation problem in the VM level
Expand All @@ -58,6 +65,7 @@ resulting a quicker reboot action.

TBD

[1]: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
[2]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection
[3]: https://libvirt.org/formatdomain.html#watchdog-device
[node_status_update_frequency]: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
[`OutOfServiceTaint`]: https://kubernetes.io/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service
[Non-Graceful Node Shutdown]: https://kubernetes.io/blog/2023/08/16/kubernetes-1-28-non-graceful-node-shutdown-ga/
[libvirt's watchdog support]: https://libvirt.org/formatdomain.html#watchdog-device

0 comments on commit 8e774a2

Please sign in to comment.