Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLT-2568 use all hosts available for es for connection pooling, node status visibility and retries #3747

Merged
merged 1 commit into from
Nov 15, 2024

Conversation

sriram-atlan
Copy link

Currently ES nodes does round robin with K8 service. Instead we want to use ES loadbalancing. This change requires altas config update too to go in hand.

atlas.graph.index.search.hostname=atlas-elasticsearch-master-0.svc.cluster.local:9200,atlas-elasticsearch-master-1.svc.cluster.local:9200,atlas-elasticsearch-master-2.svc.cluster.local:9200

Kubernetes Load Balancing vs. Elasticsearch Client Load Balancing

•	Kubernetes Service Load Balancing: When you configure a Kubernetes Service (such as a ClusterIP or LoadBalancer) to front multiple Elasticsearch pods, Kubernetes will handle the load balancing automatically. Requests sent to the Service IP or DNS name will be distributed across the available pods, and you only need to specify the Service’s endpoint in your code.
•	Elasticsearch RestClient Load Balancing: By providing multiple HttpHost instances to the Elasticsearch RestClient, the client will directly handle round-robin load balancing across those nodes, independent of Kubernetes. This approach is beneficial if you’re running outside Kubernetes or if you want more control over load balancing directly within the client.
  1. Failure Detection and Retry Mechanisms

    • Kubernetes Service: Kubernetes automatically detects when a pod goes down and removes it from the Service endpoint list until it is healthy again, thereby routing traffic only to healthy pods. However, depending on the configuration, Kubernetes may not handle transient connection errors as well as Elasticsearch’s client, which has built-in retry mechanisms.
    • Elasticsearch Client: The Elasticsearch RestClient can detect and reroute failed requests if a node goes down, thanks to its retry logic and round-robin mechanism. It allows you to control how retries, failovers, and timeouts are handled specifically within the context of Elasticsearch.

  2. Direct Node Awareness

    • Kubernetes Service: The Service abstracts the nodes, and your application does not directly know about individual Elasticsearch nodes or their health status. This setup keeps your application configuration simpler but limits finer-grained control over node selection.
    • Elasticsearch Client with Multiple Nodes: When specifying multiple nodes directly, the RestClient is aware of each node’s status. This visibility can provide better resiliency and failover since the client will avoid routing traffic to known failed nodes immediately.

  3. Connection Pooling and Efficiency

    • Kubernetes Service: Using a single Service endpoint results in connection pooling across that one endpoint. While Kubernetes will distribute requests among pods, it may not balance connections perfectly due to the intermediate layer of load balancing.
    • Elasticsearch Client: By configuring the client with multiple HttpHost nodes, the RestClient manages the connection pool for each node directly, leading to more efficient use of connections and potentially reducing latencies due to the client’s more optimized, internal load-balancing mechanisms.

Choosing Between Kubernetes Service and Multiple Hosts in Code:

•	If you prefer simplicity and are running in Kubernetes, using a single Service endpoint may be sufficient, as Kubernetes will handle load balancing and failover.
•	If you need more control over failover behavior, retries, or want fine-grained visibility into node status and client-managed load balancing, configuring multiple nodes directly in the Elasticsearch client might be more beneficial.

Type of change

  • Bug fix (fixes an issue)
  • New feature (adds functionality)

Related issues

Fix #1

Checklists

Development

  • Lint rules pass locally
  • Application changes have been tested thoroughly
  • Automated tests covering modified code pass

Security

  • Security impact of change has been considered
  • Code follows company security practices and guidelines

Code review

  • Pull request has a descriptive title and context useful to a reviewer. Screenshots or screencasts are attached as necessary
  • "Ready for review" label attached and reviewers assigned
  • Changes have been reviewed by at least one other contributor
  • Pull request linked to task tracker where applicable

@sriram-atlan sriram-atlan merged commit ae47560 into beta Nov 15, 2024
5 checks passed
@sumandas0 sumandas0 deleted the UseMultipleESHosts branch November 15, 2024 07:30
@sumandas0 sumandas0 restored the UseMultipleESHosts branch November 15, 2024 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant