A good practice is to start the bootstrap VM first. Then step by step all other
machines. They will start and boot up. Because of the --pxe
flag the VMs will
send DHCP broadcasts that request PXE boots. The DHCP server then picks up this
broadcast and replies with an IP address to use. In this case the returned IP
address will be the services VMs IP address. Then the proper FCOS image and
ignition file are selected and the installation begins.
[okd@okd ~]$ declare -A nodes \
nodes["bootstrap"]="f8:75:a4:ac:01:00" \
nodes["compute-0"]="f8:75:a4:ac:02:00" \
nodes["compute-1"]="f8:75:a4:ac:02:01" \
nodes["compute-2"]="f8:75:a4:ac:02:02" \
nodes["master-0"]="f8:75:a4:ac:03:00" \
nodes["master-1"]="f8:75:a4:ac:03:01" \
nodes["master-2"]="f8:75:a4:ac:03:02" \
nodes["infra-0"]="f8:75:a4:ac:04:00" \
nodes["infra-1"]="f8:75:a4:ac:04:01" \
nodes["infra-2"]="f8:75:a4:ac:04:02" ; \
for key in ${!nodes[@]} ; \
do \
virt-install \
-n ${key}.$HOSTNAME \
--description "${key}.$HOSTNAME" \
--os-type=Linux \
--os-variant=fedora36 \
--ram=16384 \
--vcpus=4 \
--disk ~/images/${key}.$HOSTNAME.0.qcow2,bus=virtio,size=128 \
--nographics \
--pxe \
--network network=okd,mac=${nodes[${key}]} \
--boot menu=on,useserial=on --noreboot --noautoconsole ; \
done
[okd@okd ~]$ declare -A storage \
storage["storage-0"]="f8:75:a4:ac:05:00" \
storage["storage-1"]="f8:75:a4:ac:05:01" \
storage["storage-2"]="f8:75:a4:ac:05:02" ; \
for key in ${!storage[@]} ; \
do \
virt-install \
-n ${key}.$HOSTNAME \
--description "${key}.$HOSTNAME" \
--os-type=Linux \
--os-variant=fedora36 \
--ram=32768 \
--vcpus=8 \
--disk ~/images/${key}.$HOSTNAME.0.qcow2,bus=virtio,size=128 \
--disk ~/images/${key}.$HOSTNAME.1.qcow2,bus=virtio,size=256 \
--nographics \
--pxe \
--network network=okd,mac=${storage[${key}]} \
--boot menu=on,useserial=on --noreboot --noautoconsole ; \
done
You can check the current state of the installation with:
[okd@okd ~]$ watch virsh list --all
Once the services VM is the only one running power on all virtual machines again:
[okd@okd ~]$ for node in \
bootstrap \
master-0 master-1 master-2 \
compute-0 compute-1 compute-2 \
infra-0 infra-1 infra-2 \
storage-0 storage-1 storage-2 ; \
do \
virsh autostart $node.$HOSTNAME ; \
virsh start $node.$HOSTNAME ; \
done
Wait until the cluster-bootstrapping process is complete. To check if the cluster is up run the following commands:
[okd@services ~]$ \cp ~/installer/auth/kubeconfig ~/
[okd@services ~]$ echo "export KUBECONFIG=~/kubeconfig" >> ~/.bash_profile
[okd@services ~]$ source ~/.bash_profile
[okd@services ~]$ watch oc whoami
system:admin
The cluster is bootstrapped now but more steps need to be done before the installation can be considered complete.
If you experience any trouble take a look at the official OKD documentation first. If you are sure that you found a bug related to OKD, create a new issue here.
When you add machines to a cluster, two pending certificates signing request (CSRs) are generated for each machine that you added. You must verify that these CSRs are approved or, if necessary, approve them yourself. Due to the matter of fact that we PXE booted all nodes with proper Ignition files in place, after a few minutes, some CSRs should show up.
Review the pending CSRs and ensure that the you see a client and server request
with Pending
or Approved
status for each machine that you added to the
cluster:
[okd@services ~]$ oc get csr
Because the initial CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster.
Manually approve CSRs if they are pending:
[okd@services ~]$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
This command might need to be executed multiple times as more and more CSRs are created.
After that the status of each CSR should become Approved,Issued
and all nodes
should be in status Ready
.
[okd@services ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
compute-0.okd.example.com Ready worker 159m v1.24.6+5157800
compute-1.okd.example.com Ready worker 159m v1.24.6+5157800
compute-2.okd.example.com Ready worker 159m v1.24.6+5157800
infra-0.okd.example.com Ready worker 159m v1.24.6+5157800
infra-1.okd.example.com Ready worker 159m v1.24.6+5157800
infra-2.okd.example.com Ready worker 159m v1.24.6+5157800
master-0.okd.example.com Ready master,worker 167m v1.24.6+5157800
master-1.okd.example.com Ready master,worker 167m v1.24.6+5157800
master-2.okd.example.com Ready master,worker 167m v1.24.6+5157800
storage-0.okd.example.com Ready worker 159m v1.24.6+5157800
storage-1.okd.example.com Ready worker 159m v1.24.6+5157800
storage-2.okd.example.com Ready worker 159m v1.24.6+5157800
The cluster is fully up and running once all cluster operators become available.
[okd@services ~]$ oc get clusteroperator
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.11.0-0.okd-2022-10-28-153352 True False False 6m26s
baremetal 4.11.0-0.okd-2022-10-28-153352 True False False 21m
cloud-controller-manager 4.11.0-0.okd-2022-10-28-153352 True False False 24m
cloud-credential 4.11.0-0.okd-2022-10-28-153352 True False False 25m
cluster-autoscaler 4.11.0-0.okd-2022-10-28-153352 True False False 21m
config-operator 4.11.0-0.okd-2022-10-28-153352 True False False 22m
console 4.11.0-0.okd-2022-10-28-153352 True False False 8m57s
csi-snapshot-controller 4.11.0-0.okd-2022-10-28-153352 True False False 22m
dns 4.11.0-0.okd-2022-10-28-153352 True False False 21m
etcd 4.11.0-0.okd-2022-10-28-153352 True False False 20m
image-registry 4.11.0-0.okd-2022-10-28-153352 True False False 13m
ingress 4.11.0-0.okd-2022-10-28-153352 True False False 13m
insights 4.11.0-0.okd-2022-10-28-153352 True False False 4s
kube-apiserver 4.11.0-0.okd-2022-10-28-153352 True False False 16m
kube-controller-manager 4.11.0-0.okd-2022-10-28-153352 True False False 19m
kube-scheduler 4.11.0-0.okd-2022-10-28-153352 True False False 18m
kube-storage-version-migrator 4.11.0-0.okd-2022-10-28-153352 True False False 22m
machine-api 4.11.0-0.okd-2022-10-28-153352 True False False 21m
machine-approver 4.11.0-0.okd-2022-10-28-153352 True False False 21m
machine-config 4.11.0-0.okd-2022-10-28-153352 True False False 21m
marketplace 4.11.0-0.okd-2022-10-28-153352 True False False 21m
monitoring 4.11.0-0.okd-2022-10-28-153352 True False False 12m
network 4.11.0-0.okd-2022-10-28-153352 True False False 22m
node-tuning 4.11.0-0.okd-2022-10-28-153352 True False False 21m
openshift-apiserver 4.11.0-0.okd-2022-10-28-153352 True False False 13m
openshift-controller-manager 4.11.0-0.okd-2022-10-28-153352 True False False 17m
openshift-samples 4.11.0-0.okd-2022-10-28-153352 True False False 8m42s
operator-lifecycle-manager 4.11.0-0.okd-2022-10-28-153352 True False False 21m
operator-lifecycle-manager-catalog 4.11.0-0.okd-2022-10-28-153352 True False False 21m
operator-lifecycle-manager-packageserver 4.11.0-0.okd-2022-10-28-153352 True False False 14m
service-ca 4.11.0-0.okd-2022-10-28-153352 True False False 22m
storage 4.11.0-0.okd-2022-10-28-153352 True False False 22m
Once the cluster is up and running it is save to remove the temporary bootstrapping node.
[okd@okd ~]$ virsh shutdown bootstrap.$HOSTNAME
[okd@okd ~]$ virsh undefine bootstrap.$HOSTNAME
[okd@okd ~]$ rm -rf ~/images/bootstrap.$HOSTNAME.0.qcow2
[okd@services ~]$ sudo sed -i '/bootstrap/d' /etc/haproxy/haproxy.cfg
[okd@services ~]$ sudo systemctl restart haproxy
Next: Authentication