Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add agent injector telemetry #703

Merged
merged 5 commits into from
Dec 2, 2024

Conversation

gsantos-hc
Copy link
Contributor

Add Prometheus metrics to monitor the Agent Injector's performance. New metrics include a gauge of current requests being processed by the webhook, a summary of request processing times, and a count of successful and failed injections by Kubernetes namespace.

Fixes AG-005161.

@gsantos-hc gsantos-hc requested a review from a team as a code owner November 8, 2024 14:30
@gsantos-hc gsantos-hc marked this pull request as draft November 8, 2024 19:32
@gsantos-hc gsantos-hc marked this pull request as ready for review November 8, 2024 19:47
subcommand/injector/command.go Show resolved Hide resolved
agent-inject/metrics.go Outdated Show resolved Hide resolved
Copy link
Member

@tvoran tvoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked well when I was running it locally, just left a couple thoughts.

agent-inject/metrics.go Outdated Show resolved Hide resolved
agent-inject/handler.go Outdated Show resolved Hide resolved
agent-inject/handler.go Outdated Show resolved Hide resolved
@gsantos-hc gsantos-hc marked this pull request as draft November 15, 2024 21:03
@gsantos-hc gsantos-hc marked this pull request as ready for review November 15, 2024 21:17
@tvoran tvoran added this to the 1.6.0 milestone Nov 19, 2024
tvoran
tvoran previously approved these changes Nov 19, 2024
Copy link
Member

@tvoran tvoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -142,11 +151,20 @@ func (h *Handler) Handle(w http.ResponseWriter, r *http.Request) {
msg := fmt.Sprintf("error marshalling admission response: %s", err)
http.Error(w, msg, http.StatusInternalServerError)
h.Log.Error("error on request", "Error", msg, "Code", http.StatusInternalServerError)
incrementInjectionFailures(admReq.Request.Namespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want to call incrementInjectionFailures() for the error returns further up in this function too? But this should suffice as-is; I could even see those being a different error metric added in the future.

@gsantos-hc gsantos-hc marked this pull request as draft November 20, 2024 20:19
Add Prometheus metrics to monitor the Agent Injector's performance. New
metrics include a gauge of current requests being processed by the
webhook, a summary of request processing times, and a count of
successful and failed injections by Kubernetes namespace.

Successful injections are broken down by injection type. The `injection_type`
label can assume the value `init_only` for injections with only an
initContainer (no sidecar) and `sidecar` for all other cases (sidecar
only or sidecar + initContainer).

Fixes AG-005161.
@gsantos-hc gsantos-hc force-pushed the feat/injector-telemetry branch from a2ec614 to f3f6b3a Compare November 21, 2024 20:21
@gsantos-hc
Copy link
Contributor Author

gsantos-hc commented Nov 21, 2024

Force-pushed:

  • Add a breakdown of injection types (normal sidecar vs initContainer-only) to further help identify potential targets for migration to VSO
  • Rebase from the latest main
  • Squash commits

@gsantos-hc gsantos-hc requested a review from tvoran November 21, 2024 20:23
@gsantos-hc gsantos-hc marked this pull request as ready for review November 21, 2024 20:23
@tvoran tvoran dismissed their stale review November 22, 2024 02:30

changes

agent-inject/handler.go Outdated Show resolved Hide resolved
agent-inject/handler.go Outdated Show resolved Hide resolved
agent-inject/metrics.go Outdated Show resolved Hide resolved
agent-inject/metrics.go Outdated Show resolved Hide resolved
Update the `Mutate()` method to return a struct that extends the
existing return data (AdmissionResponse) with metadata on the types of
Vault Agent injections made. The metadata informs the count of
injections by namespace, which are now further broken down by type of
injection.
Copy link
Member

@tvoran tvoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, just a couple minor thoughts (but nothing to hold this up over).

I was also playing around with ways to test the logic in incrementInjections(), and it looks like we could use prometheus' testutil something like this: https://gist.github.com/tvoran/b6ec51e716fd10efe7bb1847def8e98a

agent-inject/metrics.go Show resolved Hide resolved
agent-inject/metrics.go Outdated Show resolved Hide resolved
@benashz benashz self-requested a review November 25, 2024 17:52
Copy link
Contributor

@benashz benashz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I have included some additional feedback for your consideration. Thanks!

Subsystem: metricsSubsystem,
Name: "injections_by_namespace_total",
Help: "Total count of Agent Sidecar injections by namespace",
}, []string{metricsLabelNamespace, metricsLabelType})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that the number of namespaces should be relatively low, so we should be okay with this high cardinality metric.

subcommand/injector/command.go Show resolved Hide resolved
agent-inject/metrics.go Outdated Show resolved Hide resolved
agent-inject/metrics.go Outdated Show resolved Hide resolved
requestQueue.Inc()
requestStart := time.Now()
defer func() {
requestProcessingTime.Observe(float64(time.Since(requestStart).Milliseconds()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if would be worth tracking the failures as well. We do something like this in VSO here: https://github.com/hashicorp/vault-secrets-operator/blob/6ab056c22acb5dc4812620233e3b141fb9e02f9c/vault/client.go#L810

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, and it would unblock @tvoran's feedback here.

@benashz Given that it goes beyond the scope we were asked for, is it OK if I push that to a follow-on PR?

agent-inject/metrics.go Outdated Show resolved Hide resolved
Copy link
Contributor

@benashz benashz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@benashz benashz merged commit b05b4ca into hashicorp:main Dec 2, 2024
1 check passed
@tvoran tvoran mentioned this pull request Dec 2, 2024
@tvoran tvoran added the enhancement New feature or request label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request injector metrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants