Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(kuma-cp): implement delta xDS for envoy config exchange #11296

Draft
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

lukidzi
Copy link
Contributor

@lukidzi lukidzi commented Sep 3, 2024

Motivation

Envoy supports incremental xDS, which sends only changes rather than the entire state.

Implementation information

  • Run a Kuma 2-cluster multizone setup with incremental xDS. While fix(delta): force push EDS once CDS sent for Ads go-control-plane#4 is still needed, it’s not required for Envoy 1.32+ since it supports the Endpoints cache when initial_fetch_timeout is set.
  • Extracted common logic in each callback to support both communication types.
  • Each callback now uses a separate map to store connection information, depending on mode, preventing stream conflicts.
  • Added a global and dataplane-only flag to toggle incremental xDS for all DPPs or a single DPP.

Supporting documentation

https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#incremental-xds

Fix #XX

lukidzi added 5 commits August 4, 2024 20:12
Signed-off-by: Lukasz Dziedziak <[email protected]>
s
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi added ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change) ci/skip-e2e-test PR: Don't run e2e tests labels Sep 3, 2024
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi removed ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change) ci/skip-e2e-test PR: Don't run e2e tests labels Nov 5, 2024
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi changed the title feat(kuma-cp): implement delta xDS for envoy config exchange feat(kuma-cp): implement delta xDS for envoy config exchange Nov 5, 2024
@lukidzi lukidzi marked this pull request as ready for review November 5, 2024 15:47
@lukidzi lukidzi requested a review from a team as a code owner November 5, 2024 15:47
@lukidzi lukidzi requested review from Automaat and bartsmykla and removed request for a team November 5, 2024 15:47
Signed-off-by: Lukasz Dziedziak <[email protected]>
@Icarus9913 Icarus9913 added the ci/run-build PR: build the artifacts too label Nov 6, 2024
@Icarus9913
Copy link
Contributor

Is it possible with this case?
2.9.0(CP|DP) --> 2.10.0(CP|DP) --> 2.9.0(CP), 2.10.0(DP)

We just allow hight version CP wth low version DP right?

Comment on lines 130 to 132
switch request.XdsConfigMode {
case types.DELTA:
params.UseDelta = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CP disabled UseDeltaXds and the DP request DeltaXDSMode, should we return the DP an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should allow individual DPPs to run with Delta. This approach enables a progressive migration. UseDeltaXds enforces Delta mode for all DPPs but still allows individual DPPs to revert to the previous mode if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand it right now. With the file pkg/xds/server/v3/components.go we enable the deltaServer by default. I think maybe we shold rename the variable Experimental.UseDeltaXds because we may thought it was a swicth to enable or disable the whole delta XDS feature.

So we got these configurations:

  1. run the delta server by default
  2. CP Experimental.UseDeltaXds force enable deltaXDS for all DPPs
  3. DP ENV KUMA_EXPERIMENTAL_USE_DELTA_XDS to choose use stow or delta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the DPP environment variable is clear, and it shouldn't be a boolean. If it were, we wouldn’t be able to distinguish between using SOTW and "Not Defined." For the control plane configuration, you're right—I need to find a better name for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or Just APIType:


// APIs may be fetched via either REST or gRPC.
type ApiConfigSource_ApiType int32

const (
	// Ideally this would be 'reserved 0' but one can't reserve the default
	// value. Instead we throw an exception if this is ever used.
	//
	// Deprecated: Marked as deprecated in envoy/config/core/v3/config_source.proto.
	ApiConfigSource_DEPRECATED_AND_UNAVAILABLE_DO_NOT_USE ApiConfigSource_ApiType = 0
	// REST-JSON v2 API. The `canonical JSON encoding
	// <https://developers.google.com/protocol-buffers/docs/proto3#json>`_ for
	// the v2 protos is used.
	ApiConfigSource_REST ApiConfigSource_ApiType = 1
	// SotW gRPC service.
	ApiConfigSource_GRPC ApiConfigSource_ApiType = 2
	// Using the delta xDS gRPC service, i.e. DeltaDiscovery{Request,Response}
	// rather than Discovery{Request,Response}. Rather than sending Envoy the entire state
	// with every update, the xDS server only sends what has changed since the last update.
	ApiConfigSource_DELTA_GRPC ApiConfigSource_ApiType = 3
	// SotW xDS gRPC with ADS. All resources which resolve to this configuration source will be
	// multiplexed on a single connection to an ADS endpoint.
	// [#not-implemented-hide:]
	ApiConfigSource_AGGREGATED_GRPC ApiConfigSource_ApiType = 5
	// Delta xDS gRPC with ADS. All resources which resolve to this configuration source will be
	// multiplexed on a single connection to an ADS endpoint.
	// [#not-implemented-hide:]
	ApiConfigSource_AGGREGATED_DELTA_GRPC ApiConfigSource_ApiType = 6
)

In which I'd use the values: DELTA_GRPC and GRPC btw how come we're not using AGGREGATED_GRPC ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because we set it here : https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/core/v3/config_source.proto#envoy-v3-api-msg-config-core-v3-apiconfigsource

	AdsConfig: &envoy_core_v3.ApiConfigSource{
		ApiType:                   configType,
		TransportApiVersion:       envoy_core_v3.ApiVersion_V3,
		SetNodeOnFirstMessageOnly: true,
		GrpcServices: []*envoy_core_v3.GrpcService{
			buildGrpcService(parameters, enableReloadableTokens),
		},
	},

And ApiType can be API type (gRPC, REST, delta gRPC)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking interesting! One thing that isn't clear to me is how you do the opt-in on Kubernetes. How do I set the dataplane env var to just opt-in 1 deployment?

It was my mistake not to validate a single Kubernetes resource with Delta. I had to introduce a new struct to the Dataplane object, which allows setting this configuration. I fixed it, and now it’s possible to enable Delta using an annotation on the Kubernetes resource.

Copy link
Contributor

@lahabana lahabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look at implem only the API.

This is looking interesting! One thing that isn't clear to me is how you do the opt-in on Kubernetes. How do I set the dataplane env var to just opt-in 1 deployment?

pkg/config/app/kuma-cp/config.go Outdated Show resolved Hide resolved
Comment on lines 130 to 132
switch request.XdsConfigMode {
case types.DELTA:
params.UseDelta = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 130 to 132
switch request.XdsConfigMode {
case types.DELTA:
params.UseDelta = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or Just APIType:


// APIs may be fetched via either REST or gRPC.
type ApiConfigSource_ApiType int32

const (
	// Ideally this would be 'reserved 0' but one can't reserve the default
	// value. Instead we throw an exception if this is ever used.
	//
	// Deprecated: Marked as deprecated in envoy/config/core/v3/config_source.proto.
	ApiConfigSource_DEPRECATED_AND_UNAVAILABLE_DO_NOT_USE ApiConfigSource_ApiType = 0
	// REST-JSON v2 API. The `canonical JSON encoding
	// <https://developers.google.com/protocol-buffers/docs/proto3#json>`_ for
	// the v2 protos is used.
	ApiConfigSource_REST ApiConfigSource_ApiType = 1
	// SotW gRPC service.
	ApiConfigSource_GRPC ApiConfigSource_ApiType = 2
	// Using the delta xDS gRPC service, i.e. DeltaDiscovery{Request,Response}
	// rather than Discovery{Request,Response}. Rather than sending Envoy the entire state
	// with every update, the xDS server only sends what has changed since the last update.
	ApiConfigSource_DELTA_GRPC ApiConfigSource_ApiType = 3
	// SotW xDS gRPC with ADS. All resources which resolve to this configuration source will be
	// multiplexed on a single connection to an ADS endpoint.
	// [#not-implemented-hide:]
	ApiConfigSource_AGGREGATED_GRPC ApiConfigSource_ApiType = 5
	// Delta xDS gRPC with ADS. All resources which resolve to this configuration source will be
	// multiplexed on a single connection to an ADS endpoint.
	// [#not-implemented-hide:]
	ApiConfigSource_AGGREGATED_DELTA_GRPC ApiConfigSource_ApiType = 6
)

In which I'd use the values: DELTA_GRPC and GRPC btw how come we're not using AGGREGATED_GRPC ?

pkg/xds/bootstrap/types/bootstrap_request.go Show resolved Hide resolved
Signed-off-by: Lukasz Dziedziak <[email protected]>
@Icarus9913 Icarus9913 removed the ci/run-build PR: build the artifacts too label Nov 12, 2024
Icarus9913
Icarus9913 previously approved these changes Nov 12, 2024
@lukidzi lukidzi added the ci/run-build PR: build the artifacts too label Nov 18, 2024
Copy link
Contributor

@slonka slonka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of Qs but really happy that we're going to be shipping this 🚢 . Also does go-control-plane handle the stuff like hashing individual resources instead of whole arrays?

@@ -68,4 +69,7 @@ message ZoneIngress {
// AvailableService contains tags that represent unique subset of
// endpoints
repeated AvailableService availableServices = 3;

// EnvoyConfiguration provides additional configuration for the Envoy sidecar.
EnvoyConfiguration envoy = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need that in zone ingress? (I might answer that question myself later)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because Egress and Ingress also can use delta for configuration exchange

pkg/hds/tracker/callbacks.go Show resolved Hide resolved
@@ -38,6 +38,7 @@ func NewDefaultBootstrapGenerator(
enableReloadableTokens bool,
hdsEnabled bool,
defaultAdminPort uint32,
deltaXdsEnabled bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a builder for this? it's getting quite packed with parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change it, but probably in a separate PR

Comment on lines 165 to 169
if isDelta {
d.dpDeltaStreams[streamID] = dpStream
} else {
d.dpStreams[streamID] = dpStream
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe instead of this constant if/else you could have a generic function that takes either dpStreams or dpDeltaStreams? this applies to all functions with isDelta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed to a function as a parameter, You can take a look

@lukidzi
Copy link
Contributor Author

lukidzi commented Nov 20, 2024

A couple of Qs but really happy that we're going to be shipping this 🚢 . Also does go-control-plane handle the stuff like hashing individual resources instead of whole arrays?

Yes, it should verify individual resources so only one endpoint will be sent. I don't know if that is the question.

@lukidzi lukidzi removed the ci/run-build PR: build the artifacts too label Nov 20, 2024
Signed-off-by: Lukasz Dziedziak <[email protected]>
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi
Copy link
Contributor Author

lukidzi commented Nov 22, 2024

I think there is still some edge case causing flake, need to investigate

Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi marked this pull request as draft December 4, 2024 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants