Skip to content

Commit

Permalink
Merge pull request #240 from machadovilaca/update-runbooks-based-on-d…
Browse files Browse the repository at this point in the history
…ownstream-wording

Update runbooks based on downstream wording
  • Loading branch information
sradco authored May 15, 2024
2 parents e29dee0 + 5d23935 commit a7f1cb6
Show file tree
Hide file tree
Showing 45 changed files with 146 additions and 119 deletions.
2 changes: 1 addition & 1 deletion docs/deprecated_runbooks/KubeMacPoolDown.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ If `KubeMacPool` is down, `VirtualMachine` objects cannot be created.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
If you cannot resolve the issue, see the following resources:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ the node.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
See the [HCO cluster configuration documentation](https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/cluster-configuration.md#enablecommonbootimageimport-feature-gate)
for more information.
Expand Down
11 changes: 5 additions & 6 deletions docs/runbooks/CDIDataImportCronOutdated.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,10 @@ specification, to poll and import golden images. The updated Containerized Data
Importer (CDI) should resolve the issue within a few seconds.

2. If the issue does not resolve itself, or, if you have changed the default
storage class in the cluster,
you must delete the existing boot sources (datavolumes or volumesnapshots) in
the cluster namespace that are configured with the previous default storage
class. The CDI will recreate the data volumes with the newly configured default
storage class.
storage class in the cluster, you must delete the existing boot sources
(datavolumes or volumesnapshots) in the cluster namespace that are configured
with the previous default storage class. The CDI will recreate the data volumes
with the newly configured default storage class.

3. If your cluster is installed in a restricted network environment, disable the
`enableCommonBootImageImport` feature gate in order to opt out of automatic
Expand All @@ -109,7 +108,7 @@ updates:

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
See the [HCO cluster configuration documentation](https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/cluster-configuration.md#enablecommonbootimageimport-feature-gate)
for more information.
Expand Down
4 changes: 2 additions & 2 deletions docs/runbooks/CDIDataVolumeUnusualRestartCount.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ issue.

## Diagnosis

1. Find CDI pods with more than three restarts:
1. Find Containerized Data Importer (CDI) pods with more than three restarts:

```bash
$ kubectl get pods --all-namespaces -l app=containerized-data-importer -o=jsonpath='{range .items[?(@.status.containerStatuses[0].restartCount>3)]}{.metadata.name}{"/"}{.metadata.namespace}{"\n"}'
Expand All @@ -37,7 +37,7 @@ Delete the data volume, resolve the issue, and create a new data volume.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
If you cannot resolve the issue, see the following resources:

Expand Down
43 changes: 29 additions & 14 deletions docs/runbooks/CDIDefaultStorageClassDegraded.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,57 +2,72 @@

## Meaning

This alert fires when the default (Kubernetes or virtualization) storage class
supports smart clone (either CSI or snapshot based) and ReadWriteMany.
This alert fires when there is no default storage class that supports smart
cloning (CSI or snapshot-based) or the ReadWriteMany access mode.

A default virtualization storage class has precedence over a default Kubernetes
storage class for creating a VirtualMachine disk image.

## Impact

If the default storage class does not support smart clone, we fallback to
host-assisted cloning, which is the least efficient method of cloning.
If the default storage class does not support smart cloning, the default cloning
method is host-assisted cloning, which is much less efficient.

If the default storage class does not suppprt ReadWriteMany, a virtual machine
using it is not live-migratable.
If the default storage class does not support ReadWriteMany, virtual machines
(VMs) cannot be live migrated.

<!--DS: Note: A default OpenShift Virtualization storage class has precedence
over a default OpenShift Container Platform storage class when creating a
VM disk.-->

## Diagnosis

Get the default virtualization storage class:
1. Get the default KubeVirt storage class by running the following command:

```bash
$ export CDI_DEFAULT_VIRT_SC="$(kubectl get sc -o json | jq -r '.items[].metadata|select(.annotations."storageclass.kubevirt.io/is-default-virt-class"=="true")|.name')"
$ echo default_virt_sc=$CDI_DEFAULT_VIRT_SC
```

If the default virtualization storage class is set, check if it supports
ReadWriteMany
2. If a default KubeVirt storage class exists, check that it supports
ReadWriteMany by running the following command:

```bash
$ kubectl get storageprofile $CDI_DEFAULT_VIRT_SC -o json | jq '.status.claimPropertySets'| grep ReadWriteMany
```

Otherwise, if the default virtualization storage class is not set, get the
default Kubernetes storage class:
3. If there is no default KubeVirt storage class, get the default Kubernetes
storage class by running the following command:

```bash
$ export CDI_DEFAULT_K8S_SC="$(kubectl get sc -o json | jq -r '.items[].metadata|select(.annotations."storageclass.kubernetes.io/is-default-class"=="true")|.name')"
$ echo default_k8s_sc=$CDI_DEFAULT_K8S_SC
```

If the default Kubernetes storage class is set, check if it supports
ReadWriteMany:
4. If a default Kubernetes storage class exists, check that it supports
ReadWriteMany by running the following command:

```bash
$ kubectl get storageprofile $CDI_DEFAULT_K8S_SC -o json | jq '.status.claimPropertySets'| grep ReadWriteMany
```

<!--USstart-->
See [doc](https://github.com/kubevirt/containerized-data-importer/blob/main/doc/efficient-cloning.md)
for details about smart clone prerequisites.
<!--USend-->

## Mitigation

Ensure that the default storage class supports smart clone and ReadWriteMany.
Ensure that you have a default storage class, either Kubernetes or KubeVirt, and
that the default storage class supports smart cloning and ReadWriteMany.

<!--USstart-->
If you cannot resolve the issue, see the following resources:

- [OKD Help](https://www.okd.io/help/)
- [#virtualization Slack channel](https://kubernetes.slack.com/channels/virtualization)
<!--USend-->

<!--DS: If you cannot resolve the issue, log in to the
[Customer Portal](https://access.redhat.com) and open a support case, attaching
the artifacts gathered during the diagnosis procedure.-->
2 changes: 1 addition & 1 deletion docs/runbooks/CDIMultipleDefaultVirtStorageClasses.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ annotation.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
If you cannot resolve the issue, see the following resources:

Expand Down
46 changes: 28 additions & 18 deletions docs/runbooks/CDINoDefaultStorageClass.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,48 +2,58 @@

## Meaning

This alert fires when there is no default (Kubernetes or virtualization) storage
This alert fires when there is no default (Kubernetes or KubeVirt) storage
class, and a data volume is pending for one.

A default virtualization storage class has precedence over a default Kubernetes
A default KubeVirt storage class has precedence over a default Kubernetes
storage class for creating a VirtualMachine disk image.

## Impact

If there is no default (k8s or virt) storage class, a data volume that requests
a default storage class (storage class not explicitly specified) will be pending
for one.
If there is no default Kubernetes or KubeVirt storage class, a data volume that
does not have a specified storage class remains in a "pending" state.

## Diagnosis

Get the default Kubernetes storage class:
1. Check for a default Kubernetes storage class by running the following
command:

```bash
$ kubectl get sc -o json | jq '.items[].metadata|select(.annotations."storageclass.kubernetes.io/is-default-class"=="true")|.name'
```

Get the default virtualization storage class:
2. Check for a default KubeVirt storage class by running the following command:

```bash
$ kubectl get sc -o json | jq '.items[].metadata|select(.annotations."storageclass.kubevirt.io/is-default-virt-class"=="true")|.name'
```

To set the default Kubernetes storage class if needed:
```bash
$ kubectl patch storageclass <storage-class-name> -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
```
## Mitigation

To set the default virtualization storage class if needed:
```bash
$ kubectl patch storageclass <storage-class-name> -p '{"metadata": {"annotations":{"storageclass.kubevirt.io/is-default-virt-class":"true"}}}'
```
Create a default storage class for either Kubernetes or KubeVirt or for both.

## Mitigation
A default KubeVirt storage class has precedence over a default Kubernetes
storage class for creating a virtual machine disk image.

* Create a default Kubernetes storage class by running the following command:

Ensure that there is one storage class that has the default (k8s or virt)
storage class annotation.
```bash
$ kubectl patch storageclass <storage-class-name> -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
```

* Create a default KubeVirt storage class by running the following command:

```bash
$ kubectl patch storageclass <storage-class-name> -p '{"metadata": {"annotations":{"storageclass.kubevirt.io/is-default-virt-class":"true"}}}'
```

<!--USstart-->
If you cannot resolve the issue, see the following resources:

- [OKD Help](https://www.okd.io/help/)
- [#virtualization Slack channel](https://kubernetes.slack.com/channels/virtualization)
<!--USend-->

<!--DS: If you cannot resolve the issue, log in to the
[Customer Portal](https://access.redhat.com) and open a support case,
attaching the artifacts gathered during the diagnosis procedure.-->
4 changes: 2 additions & 2 deletions docs/runbooks/CDINotReady.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Meaning

This alert fires when the containerized data importer (CDI) is in a degraded
This alert fires when the Containerized Data Importer (CDI) is in a degraded
state:

- Not progressing
Expand Down Expand Up @@ -46,7 +46,7 @@ Try to identify the root cause and resolve the issue.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
If you cannot resolve the issue, see the following resources:

Expand Down
2 changes: 1 addition & 1 deletion docs/runbooks/CDIOperatorDown.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ installation might not function correctly.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->

<!--USstart-->
If you cannot resolve the issue, see the following resources:
Expand Down
2 changes: 1 addition & 1 deletion docs/runbooks/CDIStorageProfilesIncomplete.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ for more details about storage profiles.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
If you cannot resolve the issue, see the following resources:

Expand Down
4 changes: 2 additions & 2 deletions docs/runbooks/CnaoDown.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Meaning

This alert fires when the `Cluster-network-addons-operator` (CNAO) is down.
This alert fires when the Cluster Network Addons Operator (CNAO) is down.
The CNAO deploys additional networking components on top of the cluster.

## Impact
Expand Down Expand Up @@ -40,7 +40,7 @@ machine components. As a result, the changes might fail to take effect.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
If you cannot resolve the issue, see the following resources:

Expand Down
35 changes: 19 additions & 16 deletions docs/runbooks/HCOInstallationIncomplete.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,22 @@ uninstalling the HCO and the HCO is still running.

## Mitigation

Installation: Complete the installation by creating a `HyperConverged` CR with
its default values:

```bash
$ cat <<EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: hco-operatorgroup
namespace: kubevirt-hyperconverged
spec: {}
EOF
```

Uninstall: Uninstall the HCO. If the uninstall process continues to run, you
must resolve that issue in order to cancel the alert.
The mitigation depends on whether you are installing or uninstalling
the HCO:

- Complete the installation by creating a `HyperConverged` CR with its
default values:

```bash
$ cat <<EOF | kubectl apply -f -
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
namespace: kubevirt-hyperconverged
spec: {}
EOF
```
- Uninstall the HCO. If the uninstall process continues to run, you must
resolve that issue in order to cancel the alert.
6 changes: 3 additions & 3 deletions docs/runbooks/HPPNotReady.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,12 @@ towards a ready state.

## Mitigation

Based on the information obtained during Diagnosis, try to find and resolve the
cause of the issue.
Based on the information obtained during the diagnosis procedure, try to
identify the root cause and resolve the issue.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->

<!--USstart-->
If you cannot resolve the issue, see the following resources:
Expand Down
6 changes: 3 additions & 3 deletions docs/runbooks/HPPOperatorDown.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,12 @@ result, the HPP installation might not work correctly in the cluster.

## Mitigation

Based on the information obtained during Diagnosis, try to find and resolve the
cause of the issue.
Based on the information obtained during the diagnosis procedure, try to
identify the root cause and resolve the issue.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
If you cannot resolve the issue, see the following resources:

Expand Down
2 changes: 1 addition & 1 deletion docs/runbooks/HPPSharingPoolPathWithOS.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ other circumstances.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
If you cannot resolve the issue, see the following resources:

Expand Down
4 changes: 2 additions & 2 deletions docs/runbooks/KubeVirtNoAvailableNodesToRunVMs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Meaning

The KubeVirtNoAvailableNodesToRunVMs alert is triggered when all nodes in the
The `KubeVirtNoAvailableNodesToRunVMs` alert is triggered when all nodes in the
Kubernetes cluster are missing hardware virtualization or CPU virtualization
extensions. This means that the cluster does not have the necessary hardware
support to run virtual machines (VMs).
Expand Down Expand Up @@ -32,7 +32,7 @@ hardware virtualization or CPU virtualization extensions enabled.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
If you cannot resolve the issue, see the following resources:

Expand Down
2 changes: 1 addition & 1 deletion docs/runbooks/KubeVirtVMIExcessiveMigrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ If the problem persists, try to identify the root cause and resolve the issue.

<!--DS: If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the Diagnosis procedure.-->
attaching the artifacts gathered during the diagnosis procedure.-->
<!--USstart-->
If you cannot resolve the issue, see the following resources:

Expand Down
Loading

0 comments on commit a7f1cb6

Please sign in to comment.