Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-9487] Fleet takes long time to create a bundledeployment (180+ gitrepos) #3163

Open
mmartin24 opened this issue Dec 19, 2024 · 5 comments
Assignees
Milestone

Comments

@mmartin24
Copy link
Collaborator

mmartin24 commented Dec 19, 2024

SURE-9487

Issue description:

At a larger gitrepo scale a applying a new gitrepo using labels takes a long time to create the bundledeployment

Repro steps:

  • create a huge number of gitrepos. (k apply -f gitrepos.yaml) gitrepos.yaml is attached to the case
  • create one extra git repo that has targets using labels
  paths:
    - /hello-world
  repo: https://github.com/rbreddy/bundledependency.git
  targets:
    - clusterSelector:
        matchLabels:
          foo: bar
  • watch the resources :
kubectl get clusters -n fleet-default --output-watch-events --watch-only | ts '%Y-%m-%d %H:%M:%S' | tee  -a output.txt 
kubectl get bundles -n fleet-default --output-watch-events --watch-only | ts '%Y-%m-%d %H:%M:%S' | tee -a output.txt
  • Label a cluster to get a gitrepo targeting it:
kubectl label cluster.management.cattle.io <CLUSTER-ID> -n fleet-default foo=bar --overwrite | ts '%Y-%m-%d %H:%M:%S' | tee -a output.txt 

*watch resources on the DS cluster: e.g.

kubectl get pods -n hello --watch | ts '%Y-%m-%d %H:%M:%S' | tee -a output.txt
  • Actual behavior:
    fleet takes time to create bundle deployments and start creating resources

  • Expected behavior:
    fleet should immediately create bundle deployments and start creating resources

  • Additional notes:
    With 300 gitrepos it takes 2 minutes

Click to expand 2024-12-04 15:48:57 cluster.management.cattle.io/c-m-hz9bmvz5 labeled 2024-12-04 15:48:57 EVENT NAME CLUSTERCLASS PHASE AGE VERSION 2024-12-04 15:48:57 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:48:57 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:48:57 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:48:57 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:29 EVENT NAME BUNDLEDEPLOYMENTS-READY STATUS 2024-12-04 15:50:29 MODIFIED bundledependency-hello-world 1/2 Pending(1) [Cluster fleet-default/test1] 2024-12-04 15:50:30 NAME READY STATUS RESTARTS AGE 2024-12-04 15:50:30 nginx-helloworld-67d799d946-xl94f 0/1 ContainerCreating 0 0s 2024-12-04 15:50:30 nginx-helloworld-67d799d946-xl94f 0/1 Pending 0 0s 2024-12-04 15:50:30 nginx-helloworld-67d799d946-xl94f 0/1 Pending 0 0s 2024-12-04 15:50:31 nginx-helloworld-67d799d946-xl94f 0/1 ContainerCreating 0 1s 2024-12-04 15:50:33 nginx-helloworld-67d799d946-xl94f 1/1 Running 0 3s 2024-12-04 15:50:38 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:38 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:38 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:38 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:50:39 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:53:28 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:53:28 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:53:28 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:53:28 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:53:28 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:53:28 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:53:28 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:53:28 MODIFIED test1 Provisioned 2d4h 2024-12-04 15:54:04 MODIFIED bundledependency-hello-world 2/2
  • with only one git repo it starts the next second
Click to expand 2024-12-09 14:20:37 cluster.management.cattle.io/c-m-hz9bmvz5 labeled 2024-12-09 14:20:37 EVENT NAME BUNDLEDEPLOYMENTS-READY STATUS 2024-12-09 14:20:37 EVENT NAME BUNDLEDEPLOYMENTS-READY STATUS 2024-12-09 14:20:37 EVENT NAME CLUSTERCLASS PHASE AGE VERSION 2024-12-09 14:20:37 EVENT NAME CLUSTERCLASS PHASE AGE VERSION 2024-12-09 14:20:37 MODIFIED bundledeployment-hello-world 0/1 Pending(1) [Cluster fleet-default/test1] 2024-12-09 14:20:37 MODIFIED bundledeployment-hello-world 0/1 Pending(1) [Cluster fleet-default/test1] 2024-12-09 14:20:37 MODIFIED bundledeployment-hello-world 0/1 WaitApplied(1) [Cluster fleet-default/test1] 2024-12-09 14:20:37 MODIFIED bundledeployment-hello-world 0/1 WaitApplied(1) [Cluster fleet-default/test1] 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:37 MODIFIED test1 Provisioned 7d2h 2024-12-09 14:20:38 MODIFIED bundledeployment-hello-world 0/1 NotReady(1) [Cluster fleet-default/test1]; deployment.apps hello/nginx-helloworld [progressing] Deployment does not have minimum availability., Available: 0/1 2024-12-09 14:20:38 MODIFIED bundledeployment-hello-world 0/1 NotReady(1) [Cluster fleet-default/test1]; deployment.apps hello/nginx-helloworld [progressing] Deployment does not have minimum availability., Available: 0/1 2024-12-09 14:20:38 MODIFIED bundledeployment-hello-world 0/1 NotReady(1) [Cluster fleet-default/test1]; deployment.apps hello/nginx-helloworld [progressing] Deployment does not have minimum availability., Replicas: 0/1 2024-12-09 14:20:38 MODIFIED bundledeployment-hello-world 0/1 NotReady(1) [Cluster fleet-default/test1]; deployment.apps hello/nginx-helloworld [progressing] Deployment does not have minimum availability., Replicas: 0/1 2024-12-09 14:20:38 MODIFIED bundledeployment-hello-world 0/1 NotReady(1) [Cluster fleet-default/test1]; deployment.apps hello/nginx-helloworld [progressing] Replicas: 0/1 2024-12-09 14:20:38 MODIFIED bundledeployment-hello-world 0/1 NotReady(1) [Cluster fleet-default/test1]; deployment.apps hello/nginx-helloworld [progressing] Replicas: 0/1 2024-12-09 14:20:38 NAME READY STATUS RESTARTS AGE 2024-12-09 14:20:38 NAME READY STATUS RESTARTS AGE 2024-12-09 14:20:38 nginx-helloworld-67d799d946-lgrmt 0/1 ContainerCreating 0 0s 2024-12-09 14:20:38 nginx-helloworld-67d799d946-lgrmt 0/1 ContainerCreating 0 0s 2024-12-09 14:20:38 nginx-helloworld-67d799d946-lgrmt 0/1 Pending 0 0s 2024-12-09 14:20:38 nginx-helloworld-67d799d946-lgrmt 0/1 Pending 0 0s 2024-12-09 14:20:38 nginx-helloworld-67d799d946-lgrmt 0/1 Pending 0 0s 2024-12-09 14:20:38 nginx-helloworld-67d799d946-lgrmt 0/1 Pending 0 0s 2024-12-09 14:20:39 nginx-helloworld-67d799d946-lgrmt 0/1 ContainerCreating 0 1s 2024-12-09 14:20:39 nginx-helloworld-67d799d946-lgrmt 0/1 ContainerCreating 0 1s 2024-12-09 14:20:41 MODIFIED bundledeployment-hello-world 1/1 2024-12-09 14:20:41 MODIFIED bundledeployment-hello-world 1/1 2024-12-09 14:20:41 nginx-helloworld-67d799d946-lgrmt 1/1 Running 0 3s 2024-12-09 14:20:41 nginx-helloworld-67d799d946-lgrmt 1/1 Running 0 3s

/home/mmartin/Downloads/gitrepos-300.yaml

Environment:

Rancher Cluster:
Rancher version: 2.9.4
Number of nodes: 3
Node OS version:

Downstream Cluster:
Number of Downstream clusters: ~50
Node OS:
RKE/RKE2/K3S version: AKS
Kubernetes version: 1.28

@mmartin24 mmartin24 added this to Fleet Dec 19, 2024
@mmartin24 mmartin24 converted this from a draft issue Dec 19, 2024
@kkaempf kkaempf added the JIRA Must shout label Dec 19, 2024
@kkaempf kkaempf added this to the v2.11.0 milestone Dec 19, 2024
@manno manno moved this from 🆕 New to To Triage in Fleet Jan 15, 2025
@manno manno modified the milestones: v2.11.0, v2.11.1 Jan 15, 2025
@kkaempf kkaempf modified the milestones: v2.11.1, v2.11.0 Jan 21, 2025
@kkaempf
Copy link
Collaborator

kkaempf commented Jan 21, 2025

raising prio, should go into 2.11.0 (with backport into 2.10 😉 )

@kkaempf kkaempf moved this from To Triage to 📋 Backlog in Fleet Jan 21, 2025
@manno
Copy link
Member

manno commented Jan 22, 2025

Two minutes for 300 gitrepos to 50 clusters sounds ok to me. However, there is some doubt this is slow due to target label selector in targeting. If the targeting loop takes two minutes alone, there is something wrong with the logic. E.g. we are not caching label selectors anymore.

@aruiz14
Copy link
Contributor

aruiz14 commented Jan 23, 2025

I took a look at this and found the following info:

  • In the example, labels are added to a Cluster object from the management API, which then get propagated (almost immediately) to the Cluster object from the Fleet API group.
  • The Bundle reconciler, watching for Cluster object changes, receives the event almost immediately as well.
  • This causes all bundles (which in the example reproducer means 3000 Bundles) to be queued at once
    • For some reason, I also observed that "EnqueueRequestsFromMapFunc` is called twice for a single event, but this may be irrelevant.
  • In my environment, every Bundle reconciliation was taking around 2,5s to be processed (50 worker counts configured by default).
    • This makes the desired BundleDeployment creation to take up to 1 minute or more to be created.

I did some further experiments (basically logging duration of the different blocks within the reconciler function), and realized that most of those 2.5s were due to status Patch updates, and found a way to optimize them: #3245

@aruiz14
Copy link
Contributor

aruiz14 commented Jan 24, 2025

/backport v2.10.3

@aruiz14
Copy link
Contributor

aruiz14 commented Jan 24, 2025

/backport v2.9.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs QA review
Development

No branches or pull requests

4 participants