Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-9496] Fleet growing local etcd database to 4 GB when deploying bundles downstream cluster #3202

Open
kkaempf opened this issue Jan 10, 2025 · 0 comments

Comments

@kkaempf
Copy link
Collaborator

kkaempf commented Jan 10, 2025

SURE-9496

Issue description:

Customer is attempting to build RKE2 clusters using Fleet. They have 4 git repos, 1 to build the cluster and 3 that apply their deployments. Building the cluster with Fleet functions as expected, creating against provisioning.cattle.io API creating a vsphere config item. But deploying their other repos causes the etcd db to grow quickly to 4GB. Initially it was filling the db and crashing etcd before we upped the max size 8 GB. Customer is using a very similar setup to this https://www.suse.com/c/rancher_blog/fleet-multi-cluster-deployment-with-the-help-of-external-secrets/ to deploy ESO (i.e. the ESO Helm chart is just in their jfrog in this case).

Business impact:

Without increasing etcd db max size to 8GB the etcd db would fill and crash. Customer is concerned to use Fleet in a production environment because it's hitting 4 GB with one cluster that has 6 nodes. They're looking to run many downstream clusters, some with 100+ nodes and are concerned with the etcd db size.

Troubleshooting steps:

We attempted to build the cluster + deploy bundles to it, let the etcd size stabilize and compact the etcd db. It goes down to roughly 50 MB and then begins climbing again and reaches 4 GB.

Repro steps:

  • Create a new downstream cluster
  • Deploy bundles to new cluster
  • Inspect etcd db size on local cluster and watch it grow to 4GB in ~10 mins.

Workaround:

Is a workaround available and implemented? no

Files, logs, traces:

attached etcd keys script output to see total size of keys in etcd.

Additional notes:

Using ESO Helm chart 0.9.11.

After the etcd db size stabilizes at 4GB, adding additional clusters does add size to the db but it's not doubled with each downstream cluster.

We are aware of this #1650 but we don't see the below error message that the issue mentions. Also the ESO helm chart is 1MB in size, so not necessarily huge.

level=fatal msg="Request entity too large: limit is 3145728"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 New
Development

No branches or pull requests

1 participant