ServiceRouter constantly being synced when using enterprise Consul #3235

komapa · 2023-11-21T15:35:21Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

So a couple of problem with ServiceRouter CRDs in consul-k8s, I will try to summarize them all in this ticket because they are somewhat related and probably fixed with the same effort on your side.

When specifying routes[].destination.requestTimeout, if I specify it as 180s, consul-k8s will actually make that 3m0s so when we are using tools like ArgoCD to make sure manifests are not changes, that causes diff between the manifest we submitted as 180s and what is persistent in the CRD: 3m0s. That is kind of easy to work around by changing our manifests to use 180s but this leads to number 2)
It seems like we always have to specify idleTimeout: 0s because if we do not specify it, the CRD defaults to it anyway and also causes diffs in ArgoCD. Again, we have to work around it by asking engineers to add that value even though it does not make sense for their destination logic.

Which leads us to the main problem which is the fact that when we are running in WAN federation, there is only ever one admin partition and that is called "default". consul-k8s has a problem where it defaults the partition to empty value but Consul sever 1.5.x sets it as the value default so your configEntry.MatchesConsul(entry) check fails even though nothing is different between Consul and Kubernetes.

Running a diff instead of cmp shows the problem:

&api.ServiceRouterConfigEntry{
      ... // 2 ignored and 2 identical fields
      Routes: []api.ServiceRoute{
          {
              Match: &{HTTP: &{PathPrefix: "/"}},
              Destination: &api.ServiceRouteDestination{
                  Service:        "",
                  ServiceSubset:  "",
                  Namespace:      "application-platform-feature-flags",
-                 Partition:      "",
+                 Partition:      "default",
                  PrefixRewrite:  "",
                  RequestTimeout: s"3m0s",
                  ... // 7 identical fields
              },
          },
      },
      ... // 3 ignored fields
  }

Reproduction Steps

❯ consul config read -kind service-router -name feature-flags -namespace application-platform-feature-flags
{
    "Kind": "service-router",
    "Name": "feature-flags",
    "Partition": "default",
    "Namespace": "application-platform-feature-flags",
    "Routes": [
        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/"
                }
            },
            "Destination": {
                "RequestTimeout": "3m0s",
                "Namespace": "application-platform-feature-flags",
                "Partition": "default",
                "NumRetries": 5,
                "RetryOnConnectFailure": true
            }
        }
    ],
    "Meta": {
        "consul.hashicorp.com/source-datacenter": "us-135-stag-default",
        "external-source": "kubernetes"
    },
    "CreateIndex": 24426017,
    "ModifyIndex": 24810004
}

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceRouter
metadata:
  creationTimestamp: "2023-11-18T01:33:36Z"
  finalizers:
  - finalizers.consul.hashicorp.com
  generation: 15
  labels:
    argocd.argoproj.io/instance: application-platform-feature-flags-stag-default-us-east-1
  name: feature-flags
  namespace: application-platform-feature-flags
  resourceVersion: "186291980"
  uid: cbbff1db-86e0-4d0c-8857-56a327bfbb34
spec:
  routes:
  - destination:
      idleTimeout: 0s
      namespace: application-platform-feature-flags
      numRetries: 5
      requestTimeout: 3m0s
      retryOnConnectFailure: true
    match:
      http:
        pathPrefix: /
status:
  conditions:
  - lastTransitionTime: "2023-11-21T04:41:24Z"
    status: "True"
    type: Synced
  lastSyncedTime: "2023-11-21T04:41:24Z"

Expected behavior

Expected was to not have an infinity sync loop in the consul-connect-injector pods.

Environment details

consul-k8s version 1.2.3

Additional Context

This is the diff that I was able to at least stop it from syncing all the time:

Index: control-plane/build-support/functions/20-build.sh
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/control-plane/build-support/functions/20-build.sh b/control-plane/build-support/functions/20-build.sh
--- a/control-plane/build-support/functions/20-build.sh	(revision 96ae13d32f01d0d3e888d3ff2240e48b4da8ae9e)
+++ b/control-plane/build-support/functions/20-build.sh	(date 1700537403021)
@@ -188,7 +188,7 @@
    #   build with go install.
    #   The GOXPARALLEL environment variable is used if set
 
-   if [ $GOTAGS == "fips" ]; then
+   if [ "$GOTAGS" == "fips" ]; then
      CGO_ENABLED=1
    else
      CGO_ENABLED=0
Index: control-plane/controllers/configentry_controller.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/control-plane/controllers/configentry_controller.go b/control-plane/controllers/configentry_controller.go
--- a/control-plane/controllers/configentry_controller.go	(revision 96ae13d32f01d0d3e888d3ff2240e48b4da8ae9e)
+++ b/control-plane/controllers/configentry_controller.go	(date 1700542012198)
@@ -223,7 +223,7 @@
 		// This functionality exists to help folks who are upgrading from older helm
 		// chart versions where they had previously created config entries themselves but
 		// now want to manage them through custom resources.
-		if configEntry.GetObjectMeta().Annotations[common.MigrateEntryKey] != common.MigrateEntryTrue {
+		if !configEntry.MatchesConsul(entry) && configEntry.GetObjectMeta().Annotations[common.MigrateEntryKey] != common.MigrateEntryTrue {
 			return r.syncFailed(ctx, logger, crdCtrl, configEntry, ExternallyManagedConfigError,
 				sourceDatacenterMismatchErr(sourceDatacenter))
 		}
Index: control-plane/api/v1alpha1/servicerouter_types.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/control-plane/api/v1alpha1/servicerouter_types.go b/control-plane/api/v1alpha1/servicerouter_types.go
--- a/control-plane/api/v1alpha1/servicerouter_types.go	(revision 96ae13d32f01d0d3e888d3ff2240e48b4da8ae9e)
+++ b/control-plane/api/v1alpha1/servicerouter_types.go	(date 1700543968093)
@@ -288,6 +288,9 @@
 				if r.Destination.Namespace == "" {
 					in.Spec.Routes[i].Destination.Namespace = namespace
 				}
+				if r.Destination.Partition == "" {
+					in.Spec.Routes[i].Destination.Partition = common.DefaultConsulPartition
+				}
 			}
 		}
 	}
@@ -365,7 +368,7 @@
 	if !consulMeta.PartitionsEnabled {
 		for i, r := range in.Spec.Routes {
 			if r.Destination != nil {
-				if r.Destination.Partition != "" {
+				if r.Destination.Partition != "" && r.Destination.Partition != common.DefaultConsulPartition {
 					errs = append(errs, field.Invalid(path.Child("routes").Index(i).Child("destination").Child("partition"), r.Destination.Partition, `Consul Enterprise partitions must be enabled to set destination.partition`))
 				}
 			}
Index: Makefile
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/Makefile b/Makefile
--- a/Makefile	(revision 96ae13d32f01d0d3e888d3ff2240e48b4da8ae9e)
+++ b/Makefile	(date 1700536688019)
@@ -30,6 +30,7 @@
 control-plane-dev-docker: ## Build consul-k8s-control-plane dev Docker image.
 	@$(SHELL) $(CURDIR)/control-plane/build-support/scripts/build-local.sh -o linux -a $(GOARCH)
 	@docker build -t '$(DEV_IMAGE)' \
+       --platform linux/$(GOARCH) \
        --target=dev \
        --build-arg 'TARGETARCH=$(GOARCH)' \
        --build-arg 'GIT_COMMIT=$(GIT_COMMIT)' \
@@ -120,7 +121,7 @@
 acceptance-lint: ## Run linter in the control-plane directory.
 	cd acceptance; golangci-lint run -c ../.golangci.yml
 
-# For CNI acceptance tests, the calico CNI pluging needs to be installed on Kind. Our consul-cni plugin will not work 
+# For CNI acceptance tests, the calico CNI pluging needs to be installed on Kind. Our consul-cni plugin will not work
 # without another plugin installed first
 kind-cni-calico:
 	kubectl create namespace calico-system ||true
@@ -196,7 +197,7 @@
 	@echo "Installing copywrite"
 	@go install github.com/hashicorp/copywrite@latest
 endif
-	@copywrite headers --spdx "MPL-2.0" 
+	@copywrite headers --spdx "MPL-2.0"
 
 # ===========> CI Targets

The text was updated successfully, but these errors were encountered:

komapa · 2023-11-21T16:19:15Z

If we just decide to set partition: default in our CRDs without any of the above patches, we get this error:

error when replacing "/dev/shm/2022498401": admission webhook "mutate-servicerouter.consul.hashicorp.com" denied the request: servicerouter.consul.hashicorp.com "feature-flags-http-with-migrations" is invalid: spec.routes[0].destination.partition: Invalid value: "default": Consul Enterprise partitions must be enabled to set destination.partition

So, we cannot use admin partitions because we use -WAN federation but Consul is setting the partition to default but consul-k8s does not think partitions are enabled, so it is kind of a vicious cycle.

komapa · 2023-11-29T20:26:17Z

PR: #3284

komapa added the type/bug Something isn't working label Nov 21, 2023

dhiaayachi self-assigned this Nov 22, 2023

david-yu linked a pull request Nov 30, 2023 that will close this issue

Update MatchesConsul to normalize partitions during comparison. #3284

Merged

3 tasks

thisisnotashwin closed this as completed in #3284 Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ServiceRouter constantly being synced when using enterprise Consul #3235

ServiceRouter constantly being synced when using enterprise Consul #3235

komapa commented Nov 21, 2023

komapa commented Nov 21, 2023

komapa commented Nov 29, 2023

ServiceRouter constantly being synced when using enterprise Consul #3235

ServiceRouter constantly being synced when using enterprise Consul #3235

Comments

komapa commented Nov 21, 2023

Community Note

Overview of the Issue

Reproduction Steps

Expected behavior

Environment details

Additional Context

komapa commented Nov 21, 2023

komapa commented Nov 29, 2023