[plugin-chart-test] Improve Helm Test Cleanup and Logging #589

ibakshay · 2025-01-23T11:23:07Z

Priority

None

Is your feature request related to a problem?

Currently, when a Helm chart test fails:

Resources created by the test are not immediately cleaned up, leaving debris in the cluster
Logs from the test pod are not added to the plugin resource status field. This forces users to manually run helm test in remote clusters to debug test failures, leading to delays in troubleshooting.

Acceptance Criteria

Test Pod Logs in Plugin CR Status Field:
- Logs from the Helm test pod are captured upon test completion and appended to the status field of the plugin custom resource (CR).
- Logs are concise, well-formatted, and user-friendly for quick debugging.
- Users can view the test logs directly in the plugin CR status without manually interacting with the cluster.
Resource Cleanup:
- All resources created during a Helm test are cleaned up after the plugin is removed.
- Cleanup logic addresses ConfigMaps, Pods, RBAC permissions, and any other resources created during tests.
Update Documentation
- documentation is updated accordingly

Additional context.

No response

Reference Issues

Depends on cloudoperators/greenhouse#585

The text was updated successfully, but these errors were encountered:

ibakshay · 2025-02-17T16:04:15Z

@richardtief and myself aligned on introducing a separate status field in the plugin for the helm chart test results.
the status field will look like below.

 status:
  helmChartTestSucceeded:
    lastTransitionTime: "2025-02-13T21:28:47Z"
    status: "False"
    conditions: 
     - message: Verify that there is a service named perses
       succeeded: true
     - message: Verify that there is a configmap named perses
       succeeded: true
     - message: Verify failure deployment and running status of the perses pod
       succeeded: false

@IvoGoman Do you have any comments on this structure?
This is an enhancement to this issue cloudoperators/greenhouse#106

IvoGoman · 2025-02-17T16:48:02Z

I am not sure about adding an additional field to the Plugin status. We already have this condition as part of the Plugin.status.statusConditions. If this needs to be visible in the Dashboard, then it would require additional effort to visualize the additional field dedicated to the Helm ChartTests. It would no longer be obvious how the overall Ready condition is calculated. Internally in the controller it could be done, but the connection between the statusConditions and the dedicated helmChartTestSucceeded will not be obvious.

I would prefer reusing the already existing condition in the StatusConditions. Ideally, the message could contain a comma separated list of the failed test cases? This would fit in with the current dashboard and also make the condition easier to parse, as it only lists the failed tests. The nested structure requires to read through all conditions and look for the one(s) that failed. Depending on the amount of tests this list could also be potentially quite long and for debugging only the failed once are relevant.

ibakshay · 2025-02-21T11:58:07Z

I have created a draft PR and successfully retrieved the logs of the failed test pod from plugin_controller.go.

Below is an example of the captured logs from the controller:

{
   "controller": "plugin",
   "controllerGroup": "greenhouse.sap",
   "controllerKind": "Plugin",
   "Plugin": {
      "name": "prometheus",
      "namespace": "demo"
   },
   "namespace": "demo",
   "name": "prometheus",
   "reconcileID": "4bb73e73-9ba7-47e1-ab96-1d1dc5e5dd58",
   "logs": "POD LOGS: prometheus-test\n1..9\nnot ok 1 Verify successful deployment and running status of the prometheus-operator pod\n# (from function `verify' in file /usr/lib/bats/bats-detik/detik.bash, line 193,\n#  in test file /tests/run.sh, line 10)\n#   `verify \"there is 1 deployment named 'prometheus'\"' failed with status 3\n# Valid expression. Verification in progress...\n# Found 0 deployment named prometheus (instead of 1 expected).\nok 2 Verify successful deployment and running status of the prometheus-prometheus service and pod\nok 3 Verify successful deployment and running status of the prometheus-node-exporter pod\nok 4 Verify successful deployment and running status of the prometheus-kube-state-metrics pod\nok 5 Verify successful creation and bound status of prometheus persistent volume claims\nok 6 Verify successful creation and available replicas of prometheus Prometheus resource\nok 7 Verify creation of the prometheus-prometheus statefulset\nok 8 Verify creation of the prometheus-pod-sd PodMonitor\nok 9 Verify creation of required custom resource definitions (CRDs) for prometheus\n\n",
   "error": "test failed"
}

One more suggestion from my side: A Mix of `Plugin.status` and Kubernetes Events

I’d like to propose a hybrid approach:

Store test results in Plugin.status (as it is today).
Emit Kubernetes events for error descriptions to make failures immediately visible in kubectl describe plugin <name>.

How This Would Work:

store test result in Plugin.status.helmChartTestSucceeded.conditions

Kubernetes events for error messages:

When a Helm test fails, a Warning event is emitted with a concise error message.

Example event output:

kubectl describe plugin prometheus
...
Events:
  Type     Reason           Age   From                   Message
  ----     ------           ----  ----                   -------
  Warning  HelmTestFailed   5m    plugin-controller      Helm test failed: prometheus-operator deployment not found

This ensures quick visibility in standard Kubernetes workflows (kubectl get events).

Next Steps

WDYT? If not, I’m also fine with storing the error message directly in Plugin.status. Let me know your thoughts, and I can proceed accordingly. 🚀

CC: @richardtief @IvoGoman

ibakshay self-assigned this Jan 23, 2025

ibakshay added this to Observability Roadmap Jan 23, 2025

github-project-automation bot moved this to No status in Observability Roadmap Jan 23, 2025

ibakshay moved this from No status to Backlog in Observability Roadmap Jan 23, 2025

This was referenced Feb 18, 2025

chore(plugins): remove test annotations #656

Merged

docs(plugins): helm chart test clean up cloudoperators/greenhouse#913

Merged

feat(greenhouse): test pod logs retrieval cloudoperators/greenhouse#923

Draft

ibakshay moved this from Backlog to In progress in Observability Roadmap Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[plugin-chart-test] Improve Helm Test Cleanup and Logging #589

[plugin-chart-test] Improve Helm Test Cleanup and Logging #589

ibakshay commented Jan 23, 2025 •

edited

Loading

ibakshay commented Feb 17, 2025

IvoGoman commented Feb 17, 2025

ibakshay commented Feb 21, 2025

[plugin-chart-test] Improve Helm Test Cleanup and Logging #589

[plugin-chart-test] Improve Helm Test Cleanup and Logging #589

Comments

ibakshay commented Jan 23, 2025 • edited Loading

Priority

Is your feature request related to a problem?

Acceptance Criteria

Additional context.

Reference Issues

ibakshay commented Feb 17, 2025

IvoGoman commented Feb 17, 2025

ibakshay commented Feb 21, 2025

One more suggestion from my side: A Mix of Plugin.status and Kubernetes Events

How This Would Work:

Next Steps

ibakshay commented Jan 23, 2025 •

edited

Loading

One more suggestion from my side: A Mix of `Plugin.status` and Kubernetes Events