To quickly enable your Nvidia GPUs in your OpenShift cluster for usage, for instance for RHODS, follow the following steps:
- Install the Node Feature Discovery Operator through OperatorHub (Red Hat item).
- After installing the operator, view the operator details and select Create instance in the
NodeFeatureDiscovery
tile. In the configuration form view, scroll down and select Create to apply the default configuration. - Install the NVIDIA GPU Operator through the OperatorHub.
- After installing the operator, view the operator details and select Create instance in the
ClusterPolicy
view. In the configuration form view, scroll down and select Create to apply the default configuration.
This will kick off scanning and labeling of your worker nodes with respect to the available GPUs. This process should take around 5-10 minutes. You can verify it by inspecting the labels of your GPU-enabled worker node (navigate to Compute
-> Nodes
-> [one of your workers with GPUs] -> YAML
). Once the nvidia.com/gpu.count
label shows up with a non-zero value, the GPUs are ready for use.
As a final verification in RHODS, use the nvidia-smi
notebook in the notebooks
folder within a CUDA-enabled workbench.