Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider design of restart events issued when rolling Kafka Pods #10958

Open
scholzj opened this issue Dec 15, 2024 · 4 comments
Open

Reconsider design of restart events issued when rolling Kafka Pods #10958

scholzj opened this issue Dec 15, 2024 · 4 comments

Comments

@scholzj
Copy link
Member

scholzj commented Dec 15, 2024

Currently, when the Kafka pods are rolled, we issue Kubernetes Events describing the reason for the restart. It is done only for the Kafka, Connect and MM2 node restarts. The events are issued to the Pods as the main objects.

This approach has several issues:

  • Pods have many events when they are restarted. So the restart reason event is issued by Strimzi CO, it is easily lost among them
  • Very often, the restart reason is Pod has old revision which means that the Pod definition has changed -> but the root cause of the change could be for example updated listener certificate or something similar.

I think we should consider the future of the events used by the operator and two options for how to deal with them come to my mind:

  1. We can issue them with the custom resource as the main object they reference (i.e. the regarding field). That would make it easier to find the events as the custom resource will have only our events and not the events related to the Pod lifecycle. The Pod might be referenced as the related resource if needed. Issuing the events to the custom resource might also make it easier to consider other situations when we might want to issue events.
  2. We can simply remove them. There seem to be some users using them, but I think it is a relatively small number of users. So removing them might help to simplify our codebase and testing.
@scholzj scholzj changed the title Reconsider design restart events issued when rolling Kafka Pods Reconsider design of restart events issued when rolling Kafka Pods Dec 15, 2024
@ppatierno
Copy link
Member

I have never used them directly but my guess is that Kube events are useful for some users (as you said we have them if not that many). Even the idea about integrating self-healing could rely on events in the future.
My take on this is going with option 1. I agree that our events could be "missed" in between pods lifecycle events. My question is about which custom resource you are referring to? Our pods are related to the StrimziPodSet resource so are you referring to it and then using "related" to specify the specific pod?

@scholzj
Copy link
Member Author

scholzj commented Dec 17, 2024

I was thinking the Kafka, KafkaConnect, KafkaMIrrorMaker2 etc. resources

@katheris
Copy link
Contributor

katheris commented Jan 2, 2025

I would also vote for option 1 of having them use the custom resource as the main object they reference. I think it is useful to be able to see the restart as an event rather than having to always look through the operator logs.

@im-konge
Copy link
Member

im-konge commented Jan 9, 2025

Triaged on 9.1.2025: We should go with the option 1 and keep this issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants