Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Default "stats_flush_interval" to 1 minute for Consul Telemetry Collector into release/1.16.x #19702

Conversation

hc-github-team-consul-core
Copy link
Collaborator

Backport

This PR is auto-generated from #19663 to be assessed for backporting due to the inclusion of the label backport/1.16.

The below text is copied from the body of the original PR.


Description

Context
We want to reduce the number of metrics processed by the Consul Telemetry Collector.
By default, Envoy has a stats_flush_interval of 5 seconds (docs)

Changes

  • For users that setup the consul telemetry collector, we default the stats_flush_interval to 1 minute (60 seconds). However, we avoid doing this override in 2 cases:
      1. They have a custom envoy_stats_flush_interval, we don't want to override their use case.
      1. They have a custom stats sink: we don't want to impact their custom metrics processing.
      • A custom stats sink can be set in multiple ways: via the envoy_statsd_url, envoy_dogstatsd_url or envoy_extra_stats_sinks_json. In order to address all these cases, we can simply check for an empty args.StatsSinksJSON after the stats sinks are built up and before the collector sink is setup.

Testing & Reproduction steps

Setup

  1. Ran Makefile command to copy bootstrap config in consul-dataplane by modifying the url to be my branch.
  2. Build custom consul-dataplane Docker images that I deployed to my own docker.io.
  3. Applied consul-k8s installation with custom docker images to link a HCP self managed cluster to see how the Telemetry Gateway reacts. Essentially, an end to end test of cloud observability.
View helm value overrides
==> Consul Installation Summary
    Name: consul
    Namespace: consul
    
    Helm value overrides
    --------------------
    connectInject:
      enabled: true
    controller:
      enabled: true
    global:
      acls:
        bootstrapToken:
          secretKey: token
          secretName: consul-bootstrap-token
        manageSystemACLs: true
      cloud:
        apiHost:
          secretKey: api-hostname
          secretName: consul-hcp-api-host
        authUrl:
          secretKey: auth-url
          secretName: consul-hcp-auth-url
        clientId:
          secretKey: client-id
          secretName: consul-hcp-client-id
        clientSecret:
          secretKey: client-secret
          secretName: consul-hcp-client-secret
        enabled: true
        resourceId:
          secretKey: resource-id
          secretName: consul-hcp-resource-id
        scadaAddress:
          secretKey: scada-address
          secretName: consul-hcp-scada-address
      datacenter: test-flush-60-seconds
      gossipEncryption:
        secretKey: key
        secretName: consul-gossip-key
      image: docker.io/achooo/consul01:latest
      imageConsulDataplane: docker.io/achooo/consul-dataplane:1.4.0-dev
      metrics:
        enableTelemetryCollector: true
      tls:
        caCert:
          secretKey: tls.crt
          secretName: consul-server-ca
        enableAutoEncrypt: true
        enabled: true
    server:
      affinity: null
      replicas: 3
      serverCert:
        secretName: consul-server-cert
    telemetryCollector:
      cloud:
        clientId:
          secretKey: client-id
          secretName: consul-hcp-observability-client-id
        clientSecret:
          secretKey: client-secret
          secretName: consul-hcp-observability-client-secret
      enabled: true

Scenario 1: No custom stats sink / no custom flush interval

4.🥳 After the above installation with the new consul-dataplane image and the collector deployed, we see the default stats_flush_interval is 60s.
Screenshot 2023-11-15 at 9 37 41 PM

Scenario 2: Custom flush interval

  1. Applied a ProxyDefaults configuration to set a custom flush interval
apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
  name: global
spec:
  config:
    envoy_stats_flush_interval: "10s"

❯ kubectl apply -f proxydefaults.yaml -n consul
proxydefaults.consul.hashicorp.com/global created
  1. I then restarted all the pods.
  2. 🥳 I verify the “”stats_flush_interval” value again, which is 10s as expected
Screenshot 2023-11-15 at 9 59 01 PM

Scenario 3: Custom sinks

  1. I re-created the entire cluster again
  2. I setup ProxyDefaults with a statsd url
apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
  name: global
spec:
  config:
    envoy_statsd_url: "udp://127.0.0.1:8125"
❯ kubectl apply -f proxydefaults.yaml -n consul
proxydefaults.consul.hashicorp.com/global created
  1. 🥳 Validate no bootstrap “stats_flush_interval” configuration ( as we expect Envoy to use a default of 5s internally), which is empty as we can see:
Screenshot 2023-11-15 at 10 24 17 PM

Links

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

Overview of commits

Chris S. Kim and others added 30 commits October 3, 2023 10:11
change needed for fix in consul-enterprise
* feat: add container tests for resource http api with acl enabled

* refactor: clean up
…erver, and getting envoy bootstrap params (#19049)

* NET-5590 - authorization: check for identity:write in CA certs, xds server, and getting envoy bootstrap params

* gofmt file
Whenver a traffic permission exists for a given workload identity, turn on default deny.

Previously, this was only working at the port level.
fix explicit destination integration test
* updated architecture topic

* fixed type in arch diagram filenames

* fixed path to img file

* updated index page - still need to add links

* moved arch and tech specs to reference folder

* moved other ref topics to ref folder

* set up the Deploy folder and TF install topics

* merged secure conf into TF deploy instructions

* moved bind addr and route conf to their own topics

* moved arch and tech specs back to main folder

* update migrate-existing-tasks content

* merged manual deploy content; added serv conf ref

* fixed links

* added procedure for upgrading to dataplanes

* fixed linked reported by checker

* added updates to dataplanes overview page

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <[email protected]>

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <[email protected]>
Co-authored-by: Ganesh S <[email protected]>

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <[email protected]>
Co-authored-by: Ganesh S <[email protected]>

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <[email protected]>

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <[email protected]>
Co-authored-by: Ganesh S <[email protected]>

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <[email protected]>
Co-authored-by: Ganesh S <[email protected]>

* updated links and added redirects

* removed old architecture content

---------

Co-authored-by: Jeff Boruszak <[email protected]>
Co-authored-by: Ganesh S <[email protected]>
…ET-3463 (#18959)

* Add InboundPeerTrustBundle maps to Terminating Gateway

* Add notify and cancelation of watch for inbound peer trust bundles

* Pass peer trust bundles to the RBAC creation function

* Regenerate Golden Files

* add changelog, also adds another spot that needed peeredTrustBundles

* Add basic test for terminating gateway with peer trust bundle

* Add intention to cluster peered golden test

* rerun codegen

* update changelog

* really update the changelog

---------

Co-authored-by: Melisa Griffin <[email protected]>
Update jira-pr.yaml

Change from `hub` to `gh` for checking member roles
Make raft-wal default when v2 catalog experiment is on
Add traffic permissions integration tests.
This PR fixes an issue where upstreams did not correctly inherit the proper
namespace / partition from the parent service when attempting to fetch the
upstream protocol due to inconsistent normalization.

Some of the merge-service-configuration logic would normalize to default, while
some of the proxycfg logic would normalize to match the parent service. Due to
this mismatch in logic, an incorrect service-defaults configuration entry would
be fetched and have its protocol applied to the upstream.
* updated nav; renamed L7 traffic folder

* Added locality-aware routing to traffic mgmt overview

* Added route to local upstreams topic

* Updated agent configuration reference

* Added locality param to services conf ref

* Added locality param to conf entries

* mentioned traffic management in proxies overview

* added locality-aware to failover overview

* added docs for service rate limiting

* updated service defaults conf entry

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <[email protected]>

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <[email protected]>
Co-authored-by: Chris S. Kim <[email protected]>

* updated links and added redirects

---------

Co-authored-by: Chris S. Kim <[email protected]>
Co-authored-by: Jeff Boruszak <[email protected]>
* logs for debugging

* Init

* white spaces fix

* added change log

* Fix tests

* fix typo

* using queryoptionfilter to populate args.filter

* tests

* fix test

* fix tests

* fix tests

* fix tests

* fix tests

* fix variable name

* fix tests

* fix tests

* fix tests

* Update .changelog/18322.txt

Co-authored-by: Ganesh S <[email protected]>

* fix change log

* address nits

* removed unused line

* doing join only when filter has nodemeta

* fix tests

* fix tests

* Update agent/consul/catalog_endpoint.go

Co-authored-by: R.B. Boyer <[email protected]>

* fix tests

* removed unwanted code

---------

Co-authored-by: Ganesh S <[email protected]>
Co-authored-by: R.B. Boyer <[email protected]>
stop windows integration tests
@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/consul-collector-reduce-flush-intervals/likely-loved-hog branch from 8f220e2 to 59e62c8 Compare November 20, 2023 21:19
@hc-github-team-consul-core hc-github-team-consul-core requested review from modrake and dlaguerta and removed request for a team November 20, 2023 21:19
@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/consul-collector-reduce-flush-intervals/likely-loved-hog branch from 34f9d28 to c567889 Compare November 20, 2023 21:19
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto approved Consul Bot automated PR

@github-actions github-actions bot added type/docs Documentation needs to be created/updated/clarified theme/api Relating to the HTTP API interface theme/health-checks Health Check functionality theme/acls ACL and token generation theme/cli Flags and documentation for the CLI interface theme/config Relating to Consul Agent configuration, including reloading theme/ui Anything related to the UI theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication theme/telemetry Anything related to telemetry or observability type/ci Relating to continuous integration (CI) tooling for testing or releases pr/dependencies PR specifically updates dependencies of project theme/envoy/xds Related to Envoy support theme/contributing Additions and enhancements to community contributing materials theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/consul-terraform-sync Relating to Consul Terraform Sync and Network Infrastructure Automation labels Nov 20, 2023
@vercel vercel bot temporarily deployed to Preview – consul November 20, 2023 21:30 Inactive
@johnbuonassisi
Copy link

Closing to manually create a backport in another PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/dependencies PR specifically updates dependencies of project theme/acls ACL and token generation theme/api Relating to the HTTP API interface theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/cli Flags and documentation for the CLI interface theme/config Relating to Consul Agent configuration, including reloading theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/consul-terraform-sync Relating to Consul Terraform Sync and Network Infrastructure Automation theme/contributing Additions and enhancements to community contributing materials theme/envoy/xds Related to Envoy support theme/health-checks Health Check functionality theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics theme/telemetry Anything related to telemetry or observability theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication theme/ui Anything related to the UI type/ci Relating to continuous integration (CI) tooling for testing or releases type/docs Documentation needs to be created/updated/clarified
Projects
None yet
Development

Successfully merging this pull request may close these issues.