Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport v2.10] [SURE-9137] ClusterValues dont apply changes if one of the clusters is missing the templateValues #3190

Closed
1 task done
rancherbot opened this issue Jan 9, 2025 · 2 comments
Assignees
Milestone

Comments

@rancherbot
Copy link
Collaborator

This is a backport issue for #2943, automatically created via GitHub Actions workflow initiated by @p-se

Original issue body:

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

If a GitRepo is configured to target two or more clusters and the fleet.yaml file includes ${ .ClusterValues}, any missing templateValues in one of the cluster's spec will prevent updates or changes from being deployed to the clusters where templateValues are properly configured.

Expected Behavior

  • The changes should be applied in the cluster where the templatesValues are defined.
  • UI should show the clear Error message

Steps To Reproduce

  1. Install rancher 2.9.2 with fleet 0.10.3v
  2. Register two downstream clusters, ensuring that one of them includes templateValues.
apiVersion: fleet.cattle.io/v1alpha1
kind: Cluster
metadata:
  annotations:
  labels:
    foo: bar
    management.cattle.io/cluster-display-name: rke2custom1
    management.cattle.io/cluster-name: c-m-qmc767s2
    objectset.rio.cattle.io/hash: 464bd091084175e4d5572051571f4dfb39bcf2fd
    provider.cattle.io: rke2
  name: rke2custom1
  namespace: fleet-default
spec:
  agentAffinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - preference:
            matchExpressions:
              - key: fleet.cattle.io/agent
                operator: In
                values:
                  - 'true'
          weight: 1
  clientID: pl882vs458n4lqqrj8jc58jvkvq4xgqdfv9l7q7spnrhh7s8wjgj8v
  kubeConfigSecret: rke2custom1-kubeconfig
  kubeConfigSecretNamespace: fleet-default
  templateValues:
    generated:
      cluster_metadata:
        fqdn: server-1.example.com
        name: server-1
  1. create gitrepo from this example path: templateValues
  2. check the gitrepo dashboard for resourceReady

Environment

- Architecture: x86_64
- Fleet Version: fleet:104.0.3+up0.10.3
- Cluster:
  - Provider: custom
  - Options: 1
  - Kubernetes Version: v1.30.5+rke2r1

Logs

From the fleet-controller logs:

2024-10-08T11:59:49Z	DEBUG	bundle	Unchanged bundledeployment	{"controller": "bundle", "controllerGroup": "fleet.cattle.io", "controllerKind": "Bundle", "Bundle": {"name":"mcc-rke2custom1-managed-system-upgrade-controller","namespace":"fleet-default"}, "namespace": "fleet-default", "name": "mcc-rke2custom1-managed-system-upgrade-controller", "reconcileID": "04c5c324-f0f4-4f19-bc31-1e11a890da3e", "bundledeployment": {"apiVersion": "fleet.cattle.io/v1alpha1", "kind": "BundleDeployment", "namespace": "cluster-fleet-default-rke2custom1-43138de7906f", "name": "mcc-rke2custom1-managed-system-upgrade-controller"}, "operation": "unchanged"}
2024-10-08T11:59:49Z	DEBUG	bundle	Unchanged bundledeployment	{"controller": "bundle", "controllerGroup": "fleet.cattle.io", "controllerKind": "Bundle", "Bundle": {"name":"fleet-agent-rke2custom1","namespace":"fleet-default"}, "namespace": "fleet-default", "name": "fleet-agent-rke2custom1", "reconcileID": "d63cdb5d-544d-4356-b269-350b5564aa21", "bundledeployment": {"apiVersion": "fleet.cattle.io/v1alpha1", "kind": "BundleDeployment", "namespace": "cluster-fleet-default-rke2custom1-43138de7906f", "name": "fleet-agent-rke2custom1"}, "operation": "unchanged"}
2024-10-08T11:59:49Z	ERROR	Reconciler error	{"controller": "bundle", "controllerGroup": "fleet.cattle.io", "controllerKind": "Bundle", "Bundle": {"name":"templatevalues-templatevalues-5bfacaa9","namespace":"fleet-default"}, "namespace": "fleet-default", "name": "templatevalues-templatevalues-5bfacaa9", "reconcileID": "2a8aaea7-2194-46c2-a923-bf6f745b1a4a", "error": "failed to render helm values template: template: values:56:40: executing \"values\" at <.ClusterValues.generated.cluster_metadata.fqdn>: map has no entry for key \"generated\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222

Anything else?

current behavior:
image

@rancherbot rancherbot added JIRA Must shout kind/bug labels Jan 9, 2025
@rancherbot rancherbot added this to the v2.10.2 milestone Jan 9, 2025
@rancherbot rancherbot added this to Fleet Jan 9, 2025
@github-project-automation github-project-automation bot moved this to 🆕 New in Fleet Jan 9, 2025
@p-se p-se moved this from 🆕 New to 👀 In review in Fleet Jan 9, 2025
@weyfonk
Copy link
Contributor

weyfonk commented Jan 13, 2025

Additional QA

Problem

When a workload targets multiple clusters, and one of those clusters is missing a template value, the following happens:

  • the workload is not deployed to any of the target clusters
  • a reconcile error appears, but only in fleet-controller pod logs. They are not visible in the Rancher UI.

Solution

Fleet now reflects targeting errors, such as those caused by missing template values on clusters, in the bundle and GitRepo statuses.
Fleet deliberately refrains from creating bundle deployments for clusters without targeting issues (see this comment). A bundle working with a subset of its expected bundle deployments would be expected to cause inconsistencies in resource counts and a possible cascade of other issues. This could be revisited in a further iteration.

Testing

Engineering Testing

Manual Testing

N/A

Automated Testing

End-to-end tests have been added to check for the presence of targeting errors in bundle and GitRepo statuses.

QA Testing Considerations

Suggestion: follow the reproduction steps above, and check that:

  • targeting errors appear in the Rancher UI
  • no bundle deployments are created

Regressions Considerations

N/A

@weyfonk weyfonk moved this from 👀 In review to Needs QA review in Fleet Jan 13, 2025
@sbulage
Copy link
Contributor

sbulage commented Jan 13, 2025

System information

Before Upgrade

Rancher Version Fleet Version
Prime v2.10.1 0.11.2

Steps used to perform

  • Verified that No resources are created on any cluster.
  • No Error message

Note: Steps mentioned in the descriptions were performed pre and post upgrade.


After Upgrade

Rancher Version Fleet Version
v2.10-abac32fbae48418718f1515781875f3f3d6b7351-head fleet:v0.11.3-rc.1

Steps used to perform

  • Upgraded same cluster to v2.10-abac32fbae48418718f1515781875f3f3d6b7351-head
  • Navigate to Continuous Delivery --> GitRepo
  • Error message shown on the GitRepo page.
  • See below screenshot.
  • bundleDeployment for the same GitRepo is not created
Screenshot showing template Error message on GitRepo

Screenshot from 2025-01-13 21-03-15

Screenshot showing No bundleDeployement created for GitRepo

Screenshot from 2025-01-13 21-02-51


Below Video shows the Upgrade from Prime v2.10.1 to v2.10-head version.

Video shows GitRepo, before and after upgrade
cluster_target_message_2.10.mp4

@sbulage sbulage closed this as completed Jan 13, 2025
@github-project-automation github-project-automation bot moved this from Needs QA review to ✅ Done in Fleet Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants