Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka: add hook to log connection error reasons #541

Merged
merged 6 commits into from
Aug 12, 2024

Conversation

inge4pres
Copy link
Contributor

@inge4pres inge4pres commented Aug 9, 2024

Reason for this PR

The current metricsHook struct that collects metrics based on Kafka hooks is not equipped to also report details of why connection errors are caused.

Details

We provide a new struct that implements a (single, for now) kgo Hook dedicated to logging errors during the broker connection phase.
The struct can be iterated in the future to implement other monitoring hooks, providing logging utilities where the metrics would have too high cardinality.

Additional context

Currently, the proposed design adds the new logger hook behind the CommonConfig.DisableTelemetry flag (similarly to the metrics hook).
This can be improved and it is not a perfect solution, but it has to be done this way until a bigger refactor is done to expose the CommonConfig.hook fieldin some way.

The more idiomatic, but also more expensive, solution would be to allow exporting franz-go logs into the Logger.
This is mentioned in #540

@inge4pres inge4pres requested a review from a team as a code owner August 9, 2024 15:58
kafka/logger.go Show resolved Hide resolved
1pkg
1pkg previously approved these changes Aug 9, 2024
kafka/logger_test.go Outdated Show resolved Hide resolved
@inge4pres
Copy link
Contributor Author

inge4pres commented Aug 9, 2024

Unpleasant discovery: the newly added test is currently flaky because of franz-go implicitly retrying to reconnect at least once, in handleReq
https://github.com/twmb/franz-go/blob/56f1b061c895fec2ea773f48321cc8715f6fefbb/pkg/kgo/broker.go#L302-L309

Test failure here.
https://github.com/elastic/apm-queue/actions/runs/10324596296/job/28584375960?pr=541#step:4:8413

I think we should disable the retry to make the test consistent but I am unsure how to pass that option to the client 🤔

inge4pres and others added 6 commits August 9, 2024 21:44
Signed-off-by: inge4pres <[email protected]>
Signed-off-by: inge4pres <[email protected]>
Co-authored-by: Edoardo Tenani <[email protected]>
Signed-off-by: inge4pres <[email protected]>
@inge4pres inge4pres force-pushed the issues/apm-managed/951 branch from 85faa63 to 6194881 Compare August 9, 2024 19:44
@inge4pres
Copy link
Contributor Author

I think we should disable the retry to make the test consistent but I am unsure how to pass that option to the client 🤔

Not seeing an option to control the connection retry, I ended up changing the assertion: we now validate that there is either a single or 2 log lines containing the error

@inge4pres inge4pres requested review from endorama and 1pkg August 9, 2024 19:46
@inge4pres inge4pres changed the title Kafka: include Kafka broker hook to log connection error reasons Kafka: add hook to log connection error reasons Aug 9, 2024
@inge4pres inge4pres merged commit d9d5a01 into main Aug 12, 2024
6 checks passed
@inge4pres inge4pres deleted the issues/apm-managed/951 branch August 12, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants