-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RKE2 keeps restarting with etcd error after cert rotate #7629
Comments
The service is named root@systemd-node-1:/# systemctl stop rke2
Failed to stop rke2.service: Unit rke2.service not loaded. |
as per conversation and debug from @brandond this was a timing issue from framework test |
For the record, in case this helps someone else: The order and timing is the problem. They are all taken down within seconds of each other.
You need to do them one at a time, and wait for the post-rotate |
Environmental Info:
RKE2 Version:
Rcs
Node(s) CPU architecture, OS, and Version:
rhel 9.4
Cluster Configuration:
split roles
etcd_only_nodes = 3
cp_only_nodes = 2
no_of_worker_nodes = 1
Describe the bug:
After rotate certificates:
TLS Directory name: tls-1737633868
Comparing Directories: /var/lib/rancher/rke2/server/tls and /var/lib/rancher/rke2/server/tls-1737633868
file client-ca.crt found
file client-ca.key found
file client-ca.nochain.crt found
file peer-ca.crt found
file peer-ca.key found
file server-ca.crt found
file server-ca.key found
file request-header-ca.crt found
file request-header-ca.key found
file server-ca.crt found
file server-ca.key found
file server-ca.nochain.crt found
file service.current.key found
file service.key found
All checks and validations are fine ,please refer Test from - https://github.com/rancher/distros-test-framework/blob/main/pkg/testcase/certrotate.go
But then RKE2 keeps restart endlessly
Steps To Reproduce:
etcd_only_nodes = 3
cp_only_nodes = 2
no_of_worker_nodes = 1
On both the etcd and control-plane nodes, run the following for a full certificate rotation:
$ sudo systemctl stop rke2
$ sudo rke2 --debug certificate rotate
$ sudo systemctl start rke2
Restart the agent node: "sudo systemctl restart rke2-agent"
Expected behavior:
RKE2 cluster should be up and running fine
Actual behavior:
But then RKE2 keeps restart endlessly
Additional context / logs:
Control plane logs
ETCD LOGS
EDIT:
Commands used:
The text was updated successfully, but these errors were encountered: