Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NET-6588] Remove abandoned virtual nodes from Consul Catalog #3307

Merged
merged 1 commit into from
Dec 11, 2023

Conversation

curtbushko
Copy link
Contributor

@curtbushko curtbushko commented Dec 5, 2023

Changes proposed in this PR:

  • When deregistering a service in Consul we would leave the virtual node sitting around
  • This PR checks if a node has an empty service list and if so deletes

How I've tested this PR:

  • manually testing in kind to make sure nodes are cleaned up
  • unit tests. The enpoints controller tests have a lot of coverage for service interactions.

How I expect reviewers to test this PR:

👀

Checklist:

@curtbushko curtbushko force-pushed the curtbushko/NET-6588-abandoned-nodes branch from f4f5bff to 08b9477 Compare December 7, 2023 16:28
@curtbushko curtbushko changed the title try in CI [NET-6588] Remove abandoned virtual nodes from Consul Catalog Dec 7, 2023
@curtbushko curtbushko force-pushed the curtbushko/NET-6588-abandoned-nodes branch from 08b9477 to 109a3aa Compare December 7, 2023 17:02
@curtbushko curtbushko self-assigned this Dec 7, 2023
@curtbushko curtbushko added backport/1.1.x Backport to release/1.1.x branch backport/1.2.x This release branch is no longer active. backport/1.3.x This release branch is no longer active. labels Dec 7, 2023
@curtbushko curtbushko marked this pull request as ready for review December 7, 2023 17:03
@curtbushko curtbushko force-pushed the curtbushko/NET-6588-abandoned-nodes branch from 033d4df to a67b607 Compare December 7, 2023 22:28
// (wildcard) to determine if there any other services associated with the Node. We also only search for nodes that have
// the correct kubernetes metadata (managed-by-endpoints-controller and synthetic-node).
func (r *Controller) deregisterNode(apiClient *api.Client, k8sServiceName, k8sServiceNamespace, nodeName string) error {
nodeServices, err := r.serviceInstancesForK8SServiceNameAndNamespace(apiClient, k8sServiceName, k8sServiceNamespace, nodeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is the wrong function because it filters on the k8sServiceName, but to clean the node we need to know if ANY service is associated with the node.

Copy link
Contributor Author

@curtbushko curtbushko Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting all the services associated with that node with:

If I am reading the filter correctly it is:

give me the node with:
name = "foo-virtual"
meta {
"k8s-service" = "static-server"
"k8s-namespace" = "default"
"managed-by" = "endpoints-controller"
}

filter := fmt.Sprintf(Meta[%q] == %q and Meta[%q] == %q and Meta[%q] == %q, metaKeyKubeServiceName, k8sServiceName, constants.MetaKeyKubeNS, k8sServiceNamespace, metaKeyManagedBy, constants.ManagedByValue)

serviceList, _, err = apiClient.Catalog().NodeServiceList(nodeName, &api.QueryOptions{Filter: filter})

I think we do it with a filter like this instead of node because Consul would be slow to return without the filter.

@komapa
Copy link

komapa commented Dec 8, 2023

Let me know if you want me to run this in a real cluster which can reproduce this problem many times per day :)

@curtbushko
Copy link
Contributor Author

Let me know if you want me to run this in a real cluster which can reproduce this problem many times per day :)

@komapa Thank you!

You can grab the image at: docker.io/curtbushko/consul-k8s-control-plane-dev:abandoned-nodes

(I tagged you late on Friday but I found a bug that I had to fix. It should be better now)

Remove virtual nodes that do not have services registered
@curtbushko curtbushko force-pushed the curtbushko/NET-6588-abandoned-nodes branch from d876589 to c813068 Compare December 11, 2023 15:36
Copy link
Contributor

@thisisnotashwin thisisnotashwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this. I have minor concerns about the performance of the endpoints controller given we have steadily increased what happens during the reconciles, but i think that requires a different, more thorough introspection into what we do in the endpoints controller. This is great work right here though.

@curtbushko
Copy link
Contributor Author

I really like this. I have minor concerns about the performance of the endpoints controller given we have steadily increased what happens during the reconciles, but i think that requires a different, more thorough introspection into what we do in the endpoints controller. This is great work right here though.

@thisisnotashwin Zalimeni has a PR in draft to help with the endpoints controller performance. I do think that my PR might help with performance as we would have fewer abandoned nodes to scan through on each reconcile.

@thisisnotashwin
Copy link
Contributor

I really like this. I have minor concerns about the performance of the endpoints controller given we have steadily increased what happens during the reconciles, but i think that requires a different, more thorough introspection into what we do in the endpoints controller. This is great work right here though.

@thisisnotashwin Zalimeni has a PR in draft to help with the endpoints controller performance. I do think that my PR might help with performance as we would have fewer abandoned nodes to scan through on each reconcile.

yayyy!! that definitely helps. i think this multipronged approach is the way forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1.x Backport to release/1.1.x branch backport/1.2.x This release branch is no longer active. backport/1.3.x This release branch is no longer active.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants