Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Default "stats_flush_interval" to 1 minute for Consul Telemetry Collector into release/1.15.x #19701

Conversation

hc-github-team-consul-core
Copy link
Collaborator

Backport

This PR is auto-generated from #19663 to be assessed for backporting due to the inclusion of the label backport/1.15.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@Achooo
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/consul/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.


Description

Context
We want to reduce the number of metrics processed by the Consul Telemetry Collector.
By default, Envoy has a stats_flush_interval of 5 seconds (docs)

Changes

  • For users that setup the consul telemetry collector, we default the stats_flush_interval to 1 minute (60 seconds). However, we avoid doing this override in 2 cases:
      1. They have a custom envoy_stats_flush_interval, we don't want to override their use case.
      1. They have a custom stats sink: we don't want to impact their custom metrics processing.
      • A custom stats sink can be set in multiple ways: via the envoy_statsd_url, envoy_dogstatsd_url or envoy_extra_stats_sinks_json. In order to address all these cases, we can simply check for an empty args.StatsSinksJSON after the stats sinks are built up and before the collector sink is setup.

Testing & Reproduction steps

Setup

  1. Ran Makefile command to copy bootstrap config in consul-dataplane by modifying the url to be my branch.
  2. Build custom consul-dataplane Docker images that I deployed to my own docker.io.
  3. Applied consul-k8s installation with custom docker images to link a HCP self managed cluster to see how the Telemetry Gateway reacts. Essentially, an end to end test of cloud observability.
View helm value overrides
==> Consul Installation Summary
    Name: consul
    Namespace: consul
    
    Helm value overrides
    --------------------
    connectInject:
      enabled: true
    controller:
      enabled: true
    global:
      acls:
        bootstrapToken:
          secretKey: token
          secretName: consul-bootstrap-token
        manageSystemACLs: true
      cloud:
        apiHost:
          secretKey: api-hostname
          secretName: consul-hcp-api-host
        authUrl:
          secretKey: auth-url
          secretName: consul-hcp-auth-url
        clientId:
          secretKey: client-id
          secretName: consul-hcp-client-id
        clientSecret:
          secretKey: client-secret
          secretName: consul-hcp-client-secret
        enabled: true
        resourceId:
          secretKey: resource-id
          secretName: consul-hcp-resource-id
        scadaAddress:
          secretKey: scada-address
          secretName: consul-hcp-scada-address
      datacenter: test-flush-60-seconds
      gossipEncryption:
        secretKey: key
        secretName: consul-gossip-key
      image: docker.io/achooo/consul01:latest
      imageConsulDataplane: docker.io/achooo/consul-dataplane:1.4.0-dev
      metrics:
        enableTelemetryCollector: true
      tls:
        caCert:
          secretKey: tls.crt
          secretName: consul-server-ca
        enableAutoEncrypt: true
        enabled: true
    server:
      affinity: null
      replicas: 3
      serverCert:
        secretName: consul-server-cert
    telemetryCollector:
      cloud:
        clientId:
          secretKey: client-id
          secretName: consul-hcp-observability-client-id
        clientSecret:
          secretKey: client-secret
          secretName: consul-hcp-observability-client-secret
      enabled: true

Scenario 1: No custom stats sink / no custom flush interval

4.🥳 After the above installation with the new consul-dataplane image and the collector deployed, we see the default stats_flush_interval is 60s.
Screenshot 2023-11-15 at 9 37 41 PM

Scenario 2: Custom flush interval

  1. Applied a ProxyDefaults configuration to set a custom flush interval
apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
  name: global
spec:
  config:
    envoy_stats_flush_interval: "10s"

❯ kubectl apply -f proxydefaults.yaml -n consul
proxydefaults.consul.hashicorp.com/global created
  1. I then restarted all the pods.
  2. 🥳 I verify the “”stats_flush_interval” value again, which is 10s as expected
Screenshot 2023-11-15 at 9 59 01 PM

Scenario 3: Custom sinks

  1. I re-created the entire cluster again
  2. I setup ProxyDefaults with a statsd url
apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
  name: global
spec:
  config:
    envoy_statsd_url: "udp://127.0.0.1:8125"
❯ kubectl apply -f proxydefaults.yaml -n consul
proxydefaults.consul.hashicorp.com/global created
  1. 🥳 Validate no bootstrap “stats_flush_interval” configuration ( as we expect Envoy to use a default of 5s internally), which is empty as we can see:
Screenshot 2023-11-15 at 10 24 17 PM

Links

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

Overview of commits

@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/consul-collector-reduce-flush-intervals/externally-natural-pangolin branch from 00c2bdb to f059d47 Compare November 20, 2023 21:18
@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/consul-collector-reduce-flush-intervals/externally-natural-pangolin branch 2 times, most recently from b329276 to f059d47 Compare November 20, 2023 21:18
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto approved Consul Bot automated PR

@github-actions github-actions bot added the theme/cli Flags and documentation for the CLI interface label Nov 20, 2023
@vercel vercel bot temporarily deployed to Preview – consul November 20, 2023 21:24 Inactive
@hashicorp-cla
Copy link

hashicorp-cla commented Dec 6, 2023

CLA assistant check
All committers have signed the CLA.

@johnbuonassisi
Copy link

Closing to create a manual backport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/cli Flags and documentation for the CLI interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants