Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cri-o configured with cgroupfs cgroup manager, but received systemd slice as parent #893

Open
felipecrs opened this issue Dec 18, 2024 · 3 comments

Comments

@felipecrs
Copy link
Contributor

felipecrs commented Dec 18, 2024

Tried installing sysbox 0.6.5 on a fresh cluster, and after deploying the manifest, now pods no longer work in the cluster.

Describing the pods I can see this issue:

Warning  FailedCreatePodSandBox  2s (x17 over 3m25s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = cri-o configured with cgroupfs cgroup manager, but received systemd slice as parent: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod368236bb_a128_48e2_a105_0b2c1c7a24e5.slice

It's a vanilla Kubernetes 1.28 cluster with Ubuntu 22.04 nodes, kernel 5.15.

Seems similar to #567, but it's kinda the opposite.

@felipecrs
Copy link
Contributor Author

felipecrs commented Dec 18, 2024

My workaround ATM is to run the following in each node:

set -ex
if grep 'cgroup_manager = "cgroupfs"' /etc/crio/crio.conf; then
  sudo sed -i 's,cgroup_manager = "cgroupfs",cgroup_manager = "systemd",' /etc/crio/crio.conf
  grep 'cgroup_manager = "systemd"' /etc/crio/crio.conf
  sudo systemctl restart crio
fi

@ctalledo
Copy link
Member

ctalledo commented Jan 23, 2025

Hi @felipecrs, thanks for reporting and apologies for the belated reply.

Strange that you see this problem.

Here is the relevant code in sysbox-deploy-k8s that tries to configure both CRI-O to use the same cgroup manager as the Kubelet:
https://github.com/nestybox/sysbox-pkgr/blob/master/k8s/scripts/kubelet-config-helper.sh#L861

The code first gets the Kubelet cgroup manager, and then configures CRI-O to use the same cgroup manager as the Kubelet. Then restarts CRI-O. So it's similar to the work-around you have above.

But I suspect the problem occurs a bit before that, when it's trying to find out the cgroup manager for the Kubelet. That part must be failing some how, causing the problem you are seeing.

I've not been able to repro yet, but we can add some logging in there to see try to debug it, and I can give you a development sysbox-deploy-k8s image to try.

Let me know.

Thanks!

@felipecrs
Copy link
Contributor Author

I've not been able to repro yet, but we can add some logging in there to see try to debug it, and I can give you a development sysbox-deploy-k8s image to try.

Yes, if you provide this image with additional logging, I will run re-do the test and provide such additional information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants