Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[plugin-chart-test] Improve Helm Test Cleanup and Logging #589

Open
4 of 6 tasks
ibakshay opened this issue Jan 23, 2025 · 3 comments · May be fixed by cloudoperators/greenhouse#923
Open
4 of 6 tasks

[plugin-chart-test] Improve Helm Test Cleanup and Logging #589

ibakshay opened this issue Jan 23, 2025 · 3 comments · May be fixed by cloudoperators/greenhouse#923
Assignees

Comments

@ibakshay
Copy link
Contributor

ibakshay commented Jan 23, 2025

Priority

None

Is your feature request related to a problem?

Currently, when a Helm chart test fails:

  1. Resources created by the test are not immediately cleaned up, leaving debris in the cluster
  2. Logs from the test pod are not added to the plugin resource status field. This forces users to manually run helm test in remote clusters to debug test failures, leading to delays in troubleshooting.

Acceptance Criteria

  • Test Pod Logs in Plugin CR Status Field:

    • Logs from the Helm test pod are captured upon test completion and appended to the status field of the plugin custom resource (CR).
    • Logs are concise, well-formatted, and user-friendly for quick debugging.
    • Users can view the test logs directly in the plugin CR status without manually interacting with the cluster.
  • Resource Cleanup:

    • All resources created during a Helm test are cleaned up after the plugin is removed.
    • Cleanup logic addresses ConfigMaps, Pods, RBAC permissions, and any other resources created during tests.
  • Update Documentation

    • documentation is updated accordingly

Additional context.

No response

Reference Issues

Depends on cloudoperators/greenhouse#585

@ibakshay ibakshay self-assigned this Jan 23, 2025
@ibakshay ibakshay moved this from No status to Backlog in Observability Roadmap Jan 23, 2025
@ibakshay
Copy link
Contributor Author

@richardtief and myself aligned on introducing a separate status field in the plugin for the helm chart test results.
the status field will look like below.

 status:
  helmChartTestSucceeded:
    lastTransitionTime: "2025-02-13T21:28:47Z"
    status: "False"
    conditions: 
     - message: Verify that there is a service named perses
       succeeded: true
     - message: Verify that there is a configmap named perses
       succeeded: true
     - message: Verify failure deployment and running status of the perses pod
       succeeded: false

@IvoGoman Do you have any comments on this structure?
This is an enhancement to this issue cloudoperators/greenhouse#106

@IvoGoman
Copy link
Contributor

I am not sure about adding an additional field to the Plugin status. We already have this condition as part of the Plugin.status.statusConditions. If this needs to be visible in the Dashboard, then it would require additional effort to visualize the additional field dedicated to the Helm ChartTests. It would no longer be obvious how the overall Ready condition is calculated. Internally in the controller it could be done, but the connection between the statusConditions and the dedicated helmChartTestSucceeded will not be obvious.

I would prefer reusing the already existing condition in the StatusConditions. Ideally, the message could contain a comma separated list of the failed test cases? This would fit in with the current dashboard and also make the condition easier to parse, as it only lists the failed tests. The nested structure requires to read through all conditions and look for the one(s) that failed. Depending on the amount of tests this list could also be potentially quite long and for debugging only the failed once are relevant.

@ibakshay
Copy link
Contributor Author

I have created a draft PR and successfully retrieved the logs of the failed test pod from plugin_controller.go.

Below is an example of the captured logs from the controller:

{
   "controller": "plugin",
   "controllerGroup": "greenhouse.sap",
   "controllerKind": "Plugin",
   "Plugin": {
      "name": "prometheus",
      "namespace": "demo"
   },
   "namespace": "demo",
   "name": "prometheus",
   "reconcileID": "4bb73e73-9ba7-47e1-ab96-1d1dc5e5dd58",
   "logs": "POD LOGS: prometheus-test\n1..9\nnot ok 1 Verify successful deployment and running status of the prometheus-operator pod\n# (from function `verify' in file /usr/lib/bats/bats-detik/detik.bash, line 193,\n#  in test file /tests/run.sh, line 10)\n#   `verify \"there is 1 deployment named 'prometheus'\"' failed with status 3\n# Valid expression. Verification in progress...\n# Found 0 deployment named prometheus (instead of 1 expected).\nok 2 Verify successful deployment and running status of the prometheus-prometheus service and pod\nok 3 Verify successful deployment and running status of the prometheus-node-exporter pod\nok 4 Verify successful deployment and running status of the prometheus-kube-state-metrics pod\nok 5 Verify successful creation and bound status of prometheus persistent volume claims\nok 6 Verify successful creation and available replicas of prometheus Prometheus resource\nok 7 Verify creation of the prometheus-prometheus statefulset\nok 8 Verify creation of the prometheus-pod-sd PodMonitor\nok 9 Verify creation of required custom resource definitions (CRDs) for prometheus\n\n",
   "error": "test failed"
}

One more suggestion from my side: A Mix of Plugin.status and Kubernetes Events

I’d like to propose a hybrid approach:

  • Store test results in Plugin.status (as it is today).
  • Emit Kubernetes events for error descriptions to make failures immediately visible in kubectl describe plugin <name>.

How This Would Work:

  1. store test result in Plugin.status.helmChartTestSucceeded.conditions

  2. Kubernetes events for error messages:

    • When a Helm test fails, a Warning event is emitted with a concise error message.
    • Example event output:
      kubectl describe plugin prometheus
      ...
      Events:
        Type     Reason           Age   From                   Message
        ----     ------           ----  ----                   -------
        Warning  HelmTestFailed   5m    plugin-controller      Helm test failed: prometheus-operator deployment not found
    • This ensures quick visibility in standard Kubernetes workflows (kubectl get events).

Next Steps

WDYT? If not, I’m also fine with storing the error message directly in Plugin.status. Let me know your thoughts, and I can proceed accordingly. 🚀

CC: @richardtief @IvoGoman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

2 participants