Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GitOps updating the composition claim does not trigger reconciliation #96

Open
JayChanggithub opened this issue Jan 10, 2025 · 10 comments
Assignees

Comments

@JayChanggithub
Copy link

JayChanggithub commented Jan 10, 2025

Describe the bug

We encountered a strange situation where, after updating the XRD claim values in helm repository and ensuring the composition was rendered correctly with all statuses ready, the reconciliation process was not triggered. The issue persisted until we deleted the XRD resources

Tested with Provider Version

apiServer:
  type: GardenerDedicated
btpServiceOperator:
  version: 0.6.8
externalSecretsOperator:
  version: 0.10.7
flux:
  version: 2.14.0
crossplane:
  version: 1.18.0
  providers:
    - name: btp
      version: 1.0.2
    - name: kubernetes
      version: 0.15.0
    - name: ias
      version: 0.2.0
    - name: vault
      version: 1.0.0
    - name: btp-account
      version: 0.7.6
    - name: argocd
      version: 0.9.1

To Reproduce

Steps to reproduce the behavior:

  1. Git clone repository and updated values.yaml to desired also create PR.
# from 
kymaParameters:
  region: eastus
  machineType: Standard_D8s_v5
  autoScalerMin: 3
  autoScalerMax: 4

# to 
kymaParameters:
  region: eastus
  machineType: Standard_D8s_v5
  autoScalerMin: 3
  autoScalerMax: 5
  1. Merge to main branch to trigger fluxCD GitOps mechanism.
  2. type serval commands to ensure as below.
$ k get helmreleases.helm.toolkit.fluxcd.io
helmrelease   2d8h   True    Helm upgrade succeeded for release default/default-helmrelease.v11 with chart [email protected]+9c6d1e235c9e

$ k get xrd       
xbtpsubaccounts.sre.tools.sap     True          True      2d8h
xkymaenvironments.sre.tools.sap   True          True      2d8h

$ k get kymaenvironments.sre.tools.sap k-sre-slow-us21-co(Ensure the latest values was rendered success)
$ k get kymaenvironments.environment.btp.orchestrate.cloud.sap k-sre-slow-us21-co -oyaml(Not yet changed)
  1. Until delete xrd resources such that trigger reconciliation also waiting a moments. but these behavior will trigger delete kyma resources then create.
k delete xkymaenvironments.sre.tools.sap k-sre-slow-us21-co-kn7t5
  1. The reconciliation and processing of Kyma resources in the BTP Cockpit indicate a successful trigger.

Expected behavior

It should reconcile to the latest values automatically, rather than requiring manual deletion.

Additional context

# mcp
project: iccx-sre-us21
k get workspace -n project-iccx-sre-us21 slow 
k get mcp -n project-iccx-sre-us21--ws-slow 

# GitOps Repository 
https://github.tools.sap/rgm/infra-cloud-orchestrator-rgm
@sdischer-sap
Copy link
Member

Just to get it right how to diagnosed that issue could you please confirm how I understood it:

So you deployed those changes within git, did a rollout and the expected changes did reflect in the forProvider section, but not in the atProvider. Also I guess you waited some minutes before testing with delete right?
Then you triggered a delete via kubectl, so the whole instance got deleted in btp and created it again from scratch.

So basically what you are saying is the whole drift detection generally isn't working on the kymainstance, is that right?

@JayChanggithub
Copy link
Author

@sdischer-sap
Yes, that’s almost correct. Not sure what’s exactly behavior and we expected the gitops of fluxcd should be changed our desired state! Did you have anything further recommendation for these?Thanks

@sdischer-sap sdischer-sap self-assigned this Jan 20, 2025
@JayChanggithub
Copy link
Author

JayChanggithub commented Jan 20, 2025

The actually dedicated the gitops repository here. https://github.tools.sap/rgm/infra-cloud-orchestrator-rgm, And also empty clientID error again.

Image

@sdischer-sap
Copy link
Member

sdischer-sap commented Jan 20, 2025

@JayChanggithub just tried to reproduce the reconcilation manually with a plan kymaenvironment and everything works as expected. The values are updated as they should (it takes about half an hour tough, since kyma takes that long).

Do you have some not updated instance in an MCP that I can have a look on? So I mean somewhere where I can see that the changes in the spec where applied, but the update in BTP is not happening.

@JayChanggithub
Copy link
Author

Hi @sdischer-sap Yes, we have two MCPs: one is dev, and the other is slow. Would you be able to access them and check? Additionally, any changes made through the GitOps approach are managed in the dedicated repository here: GitHub Link.

Project: iccx-sre-us21
Workspaces: dev / slow (k get workspace -n project-iccx-sre-us21)
MCPs: project-iccx-sre-us21--ws-dev / project-iccx-sre-us21--ws-slow

Let me know if you need further details!

@JayChanggithub
Copy link
Author

@sdischer-sap @sdischer-sap Additionally, we have adopted gitrepositories.source.toolkit.fluxcd.io and helmreleases.helm.toolkit.fluxcd.io to implement GitOps, ensuring a consistent reconciliation lifecycle. We are not using manually to do changed. Thanks

 k get helmreleases.helm.toolkit.fluxcd.io 
NAME          AGE   READY   STATUS
helmrelease   12d   True    Helm upgrade succeeded for release default/default-helmrelease.v56 with chart [email protected]+3b8b7c559779

k get gitrepositories.source.toolkit.fluxcd.io   
NAME                 URL                                                         AGE   READY   STATUS
k-sre-slow-us21-co   https://github.tools.sap/rgm/infra-cloud-orchestrator-rgm   12d   True    stored artifact for revision 'main@sha1:3b8b7c55977997678454574d1c8b130fd5b3c9df'

@JayChanggithub
Copy link
Author

@sdischer-sap
I encountered a strange situation involving the Kyma environments. The kymaenvironments.sre.tools.sap, which is provisioned by our composition, contains the expected content. However, the actual resources, such as kymaenvironments.environment.btp.orchestrate.cloud.sap, do not match the expected state.

k get kymaenvironments.sre.tools.sap -oyaml  

apiVersion: v1
items:
- apiVersion: sre.tools.sap/v1alpha1
  kind: KymaEnvironment
  metadata:
    annotations:
      meta.helm.sh/release-name: default-helmrelease
      meta.helm.sh/release-namespace: default
    creationTimestamp: "2025-01-20T09:42:17Z"
    finalizers:
    - finalizer.apiextensions.crossplane.io
    generation: 8
    labels:
      app.kubernetes.io/managed-by: Helm
      helm.toolkit.fluxcd.io/name: helmrelease
      helm.toolkit.fluxcd.io/namespace: default
    name: k-sre-slow-us21-co
    namespace: default
    resourceVersion: "10972709"
    uid: 897966ec-b601-4ef9-98f2-a1f574f2db46
  spec:
    compositeDeletePolicy: Foreground
    compositionRef:
      name: sre-kyma-environment
    compositionRevisionRef:
      name: sre-kyma-environment-f48343a
    compositionUpdatePolicy: Automatic
    kymaParameters:
      administrators:
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      autoScalerMax: 9
      autoScalerMin: 3
      machineType: Standard_D8s_v5
      region: eastus
    resourceRef:
      apiVersion: sre.tools.sap/v1alpha1
      kind: XKymaEnvironment
      name: k-sre-slow-us21-co-72vvt
    servicePlanName: azure
    subaccountName: k-sre-slow-us21-co
  status:
    conditions:
    - lastTransitionTime: "2025-01-20T09:42:17Z"
      reason: ReconcileSuccess
      status: "True"
      type: Synced
    - lastTransitionTime: "2025-01-20T09:42:17Z"
      message: Claim is waiting for composite resource to become Ready
      reason: Waiting
      status: "False"
      type: Ready
kind: List
metadata:
  resourceVersion: ""
k get kymaenvironments.environment.btp.orchestrate.cloud.sap -oyaml     
apiVersion: v1
items:
- apiVersion: environment.btp.orchestrate.cloud.sap/v1alpha1
  kind: KymaEnvironment
  metadata:
    annotations:
      crossplane.io/composition-resource-name: Kyma Environment
      crossplane.io/external-create-failed: "2025-01-20T09:42:20Z"
      crossplane.io/external-create-pending: "2025-01-20T09:42:25Z"
      crossplane.io/external-create-succeeded: "2025-01-20T09:42:29Z"
      crossplane.io/external-name: k-sre-slow-us21-co
    creationTimestamp: "2025-01-20T09:42:18Z"
    finalizers:
    - finalizer.managedresource.crossplane.io
    generateName: k-sre-slow-us21-co-72vvt-
    generation: 2
    labels:
      crossplane.io/claim-name: k-sre-slow-us21-co
      crossplane.io/claim-namespace: default
      crossplane.io/composite: k-sre-slow-us21-co-72vvt
    name: k-sre-slow-us21-co
    ownerReferences:
    - apiVersion: sre.tools.sap/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: XKymaEnvironment
      name: k-sre-slow-us21-co-72vvt
      uid: c1d561e2-093d-443c-b231-58dbfc84d869
    resourceVersion: "10861529"
    uid: 0732bcda-adda-4565-b564-4b7c73464e38
  spec:
    cloudManagementRef:
      name: k-sre-slow-us21-co
    cloudManagementSecret: k-sre-slow-us21-co-cis-local
    cloudManagementSecretNamespace: default
    cloudManagementSubaccountGuid: 2dbb3308-71c6-4b1b-bdbf-9e83ccc4136a
    deletionPolicy: Delete
    forProvider:
      parameters:
        administrators:
        - [email protected]
        - [email protected]
        - [email protected]
        - [email protected]
        - [email protected]
        - [email protected]
        - [email protected]
        autoScalerMax: 7
        autoScalerMin: 3
        machineType: Standard_D8s_v5
        oidc:
          clientID: ""
          groupsClaim: groups
          issuerURL: ""
          signingAlgs:
          - RS256
          usernameClaim: email
          usernamePrefix: '-'
        region: eastus
      planName: azure
    managementPolicies:
    - '*'
    providerConfigRef:
      name: account-provider-config
    subaccountGuid: 2dbb3308-71c6-4b1b-bdbf-9e83ccc4136a
    subaccountRef:
      name: k-sre-slow-us21-co
    writeConnectionSecretToRef:
      name: k-sre-slow-us21-co-kyma-environment
      namespace: default
  status:
    atProvider:
      brokerId: 1B1719DE-CC81-4686-A9E2-4631AA82D2BD
      createdDate: "1737366175744.000000"
      customLabels: {}
      description: created via crossplane-provider-btp-account
      environmentType: kyma
      globalAccountGUID: 2291c59f-008e-432d-a082-e3b49f9b5e26
      id: A466E72F-AC57-4DD7-9C45-BE462EBD03AD
      modifiedDate: "1737366175744.000000"
      name: k-sre-slow-us21-co
      parameters: '{"administrators":["[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]"],"autoScalerMax":7,"autoScalerMin":3,"machineType":"Standard_D8s_v5","name":"k-sre-slow-us21-co","oidc":{"clientID":"","groupsClaim":"groups","issuerURL":"","signingAlgs":["RS256"],"usernameClaim":"email","usernamePrefix":"-"},"orchestrate.cloud.sap/subaccount-operator":"0732bcda-adda-4565-b564-4b7c73464e38","region":"eastus"}'
      planId: 4deee563-e5ec-4731-b9b1-53b42d855f0c
      planName: azure
      platformId: 75682e6d-bc4c-4e2f-93f1-1dee1f1d49bf
      serviceId: 47c9dcbf-ff30-448e-ab36-d3bad66ba281
      serviceName: kymaruntime
      state: CREATION_FAILED
      stateMessage: clientID must not be empty, issuerURL must not be empty
      subaccountGUID: 2dbb3308-71c6-4b1b-bdbf-9e83ccc4136a
      tenantId: 2dbb3308-71c6-4b1b-bdbf-9e83ccc4136a
      type: Provision
    conditions:
    - lastTransitionTime: "2025-01-20T09:42:39Z"
      reason: Unavailable
      status: "False"
      type: Ready
    - lastTransitionTime: "2025-01-20T09:42:29Z"
      reason: ReconcileSuccess
      status: "True"
      type: Synced
kind: List
metadata:
  resourceVersion: ""

@sdischer-sap
Copy link
Member

I am confused here.
In project-iccx-sre-us21--ws-slow I don't see any kymaenvironment or related claim.
In project-iccx-sre-us21--ws-dev there is one, but it failed due to a missing subaccountID (because also the referenced subaccount was not created)

Also the last snippet you posted shows an error related to a not set clientID which does not really relate to this problem.

Lastly please not that you are still using the old preview landscape as well as the old innter source btp provider, which both are not really supported anymore. Since you are having issues with the setup, maybe its still a good time to migrate to the new canary or live landscape and use the open source provider. Definitely better then doing it while you have a bigger landscape up and running.

@JayChanggithub
Copy link
Author

@sdischer-sap
Thank you. First, I’d like to confirm how to determine whether it’s a preview, live, or canary landscape. We are using the base kubeconfig, as outlined in the documentation here, specifically under the "live" section to create the MCP.

Based on my understanding, it aligns with the live landscape. Additionally, both project-iccx-sre-us21--ws-dev and project-iccx-sre-us21--ws-slow are provided by the live environment, as shown in the relevant information below.

Currently, we are using Helm GitOps for project-iccx-sre-us21--ws-dev, but it does not include composition resources that can be reconciled through GitOps. On the other hand, the resources for project-iccx-sre-us21--ws-slow are claimed via composition, but they also cannot be reconciled through GitOps.

The GitOps repository in use can be found here: https://github.tools.sap/rgm/infra-cloud-orchestrator-rgm.git. Sorry to lack some crucial context for you caused your misunderstand.

k get project iccx-sre-us21  

NAME            DISPLAY NAME                   RESULTING NAMESPACE     AGE
iccx-sre-us21   Cloud Orchestrator IaD setup   project-iccx-sre-us21   22d
k get workspace -n project-iccx-sre-us21         
NAME   DISPLAY NAME                     RESULTING NAMESPACE              AGE
dev    Cloud Orchestrator Development   project-iccx-sre-us21--ws-dev    30h
slow   Cloud Orchestrator Development   project-iccx-sre-us21--ws-slow   13d

@sdischer-sap
Copy link
Member

sdischer-sap commented Jan 22, 2025

Sorry my bad about the landscape. If you are using the kubeconfig under live then thats perfectly fine. I was confused before, because I found some mcps with exactly the same name in the old preview landscape, but I guess those are outdated then.

That being said, on the MCPs in live I still can't find the resources you listed above.
In project-iccx-sre-us21--ws-slow I do not see any KymaEnvironment. In project-iccx-sre-us21--ws-dev I see one, but it has different parameters (e.g. a clientID) and status.

Lets focus on exactly one resource that you currently have live in a cluster. So please provide again:

  • a describe on the KymaEnvironment that does not properly apply the update
  • name of the mcp where I can find this resource

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants