Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-9460] Fleet not picking up gitrepo updates, no job created to update #3138

Open
kkaempf opened this issue Dec 11, 2024 · 3 comments
Open
Assignees
Milestone

Comments

@kkaempf
Copy link
Collaborator

kkaempf commented Dec 11, 2024

SURE-9460

Issue description

After upgrading Rancher to 2.9.3 / fleet to v0.10.4, some gitrepos are no longer receiving updates. Customer update the repository, but changes are not pushed to the clusters. No Job is created to pull in the changes that should be tracked by the gitRepo.

In fleet v0.10.4, there were changes made to how jobs are managed in fleet. Could these changes be the cause of the issue here? #2932 seems to change how jobs are managed.

Business impact:

Unable to receive updates to applications using fleet for continuous delivery.

Troubleshooting steps:

GitJob pod, does not show that jobs are completing for those gitRepos, We are also unable to find jobs for the

Repro steps:

Upgrade to Rancher 2.9.3 from 2.9.2

Workaround:

Is a workaround available and implemented? yes
What is the workaround:
Customer found that by editing a gitRepo in the Rancher UI, changing nothing, then saving, it will eventually cause the repo to pull the change and make the necessary updates.

When making those changes, a couple lines are changed within the gitRepo:
spec.correctDrift: {} is added
status.commit is updated
status.lastPollingTriggered time is updated (time changed by more than a day).

Actual behavior:

repositories are not updated.

Expected behavior:

Repositories are updated.

@kkaempf kkaempf added this to the v2.10.2 milestone Dec 11, 2024
@kkaempf kkaempf added this to Fleet Dec 11, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Fleet Dec 11, 2024
@manno manno moved this from 🆕 New to 📋 Backlog in Fleet Dec 11, 2024
@carneiroskeeled
Copy link
Contributor

carneiroskeeled commented Dec 17, 2024

@kkaempf have you tried to enable this?

  #  correctDrift:
  #    enabled: false

Or with the webhooks?

@manno
Copy link
Member

manno commented Jan 6, 2025

We suspect that the GitRepo drops out of the new polling. The new polling is based on RequeueAfter and uses the reconcilers workqueue.

We suspect this could happen, because of

  • a condition prevents the resource from being queued, when it should
  • resource drops out of requeue because of an unhandled error
  • not enough reconcile workers, so that some reconciles get delayed indefinitely

Until we can reproduce this:
We are adding some jitter to the polling, so that 300 gitrepos don't reconcile at the same time.
We shorten the resync period to pick up dropped gitrepos.

@manno manno moved this from 📋 Backlog to 🏗 In progress in Fleet Jan 6, 2025
@manno manno assigned manno and 0xavi0 Jan 6, 2025
@rancher rancher deleted a comment from rancherbot Jan 9, 2025
@manno
Copy link
Member

manno commented Jan 9, 2025

/backport v2.10.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs QA review
Development

No branches or pull requests

4 participants