PLT-2568 use all hosts available for es for connection pooling, node status visibility and retries #3747

sriram-atlan · 2024-11-15T06:38:22Z

Currently ES nodes does round robin with K8 service. Instead we want to use ES loadbalancing. This change requires altas config update too to go in hand.

atlas.graph.index.search.hostname=atlas-elasticsearch-master-0.svc.cluster.local:9200,atlas-elasticsearch-master-1.svc.cluster.local:9200,atlas-elasticsearch-master-2.svc.cluster.local:9200

Kubernetes Load Balancing vs. Elasticsearch Client Load Balancing

•	Kubernetes Service Load Balancing: When you configure a Kubernetes Service (such as a ClusterIP or LoadBalancer) to front multiple Elasticsearch pods, Kubernetes will handle the load balancing automatically. Requests sent to the Service IP or DNS name will be distributed across the available pods, and you only need to specify the Service’s endpoint in your code.
•	Elasticsearch RestClient Load Balancing: By providing multiple HttpHost instances to the Elasticsearch RestClient, the client will directly handle round-robin load balancing across those nodes, independent of Kubernetes. This approach is beneficial if you’re running outside Kubernetes or if you want more control over load balancing directly within the client.

Failure Detection and Retry Mechanisms

• Kubernetes Service: Kubernetes automatically detects when a pod goes down and removes it from the Service endpoint list until it is healthy again, thereby routing traffic only to healthy pods. However, depending on the configuration, Kubernetes may not handle transient connection errors as well as Elasticsearch’s client, which has built-in retry mechanisms.
• Elasticsearch Client: The Elasticsearch RestClient can detect and reroute failed requests if a node goes down, thanks to its retry logic and round-robin mechanism. It allows you to control how retries, failovers, and timeouts are handled specifically within the context of Elasticsearch.
Direct Node Awareness

• Kubernetes Service: The Service abstracts the nodes, and your application does not directly know about individual Elasticsearch nodes or their health status. This setup keeps your application configuration simpler but limits finer-grained control over node selection.
• Elasticsearch Client with Multiple Nodes: When specifying multiple nodes directly, the RestClient is aware of each node’s status. This visibility can provide better resiliency and failover since the client will avoid routing traffic to known failed nodes immediately.
Connection Pooling and Efficiency

• Kubernetes Service: Using a single Service endpoint results in connection pooling across that one endpoint. While Kubernetes will distribute requests among pods, it may not balance connections perfectly due to the intermediate layer of load balancing.
• Elasticsearch Client: By configuring the client with multiple HttpHost nodes, the RestClient manages the connection pool for each node directly, leading to more efficient use of connections and potentially reducing latencies due to the client’s more optimized, internal load-balancing mechanisms.

Choosing Between Kubernetes Service and Multiple Hosts in Code:

•	If you prefer simplicity and are running in Kubernetes, using a single Service endpoint may be sufficient, as Kubernetes will handle load balancing and failover.
•	If you need more control over failover behavior, retries, or want fine-grained visibility into node status and client-managed load balancing, configuring multiple nodes directly in the Elasticsearch client might be more beneficial.

Type of change

Bug fix (fixes an issue)
New feature (adds functionality)

Related issues

Fix #1

Checklists

Development

Lint rules pass locally
Application changes have been tested thoroughly
Automated tests covering modified code pass

Security

Security impact of change has been considered
Code follows company security practices and guidelines

Code review

Pull request has a descriptive title and context useful to a reviewer. Screenshots or screencasts are attached as necessary
"Ready for review" label attached and reviewers assigned
Changes have been reviewed by at least one other contributor
Pull request linked to task tracker where applicable

…status visibility and retry

PLT-2568 use all hosts available for es for connection pooling, node …

c9e8726

…status visibility and retry

sriram-atlan requested review from arniesaha, mehtaanshul, sumandas0 and aarshi0301 November 15, 2024 06:49

sriram-atlan merged commit ae47560 into beta Nov 15, 2024
5 checks passed

sumandas0 deleted the UseMultipleESHosts branch November 15, 2024 07:30

sumandas0 restored the UseMultipleESHosts branch November 15, 2024 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLT-2568 use all hosts available for es for connection pooling, node status visibility and retries #3747

PLT-2568 use all hosts available for es for connection pooling, node status visibility and retries #3747

sriram-atlan commented Nov 15, 2024

PLT-2568 use all hosts available for es for connection pooling, node status visibility and retries #3747

PLT-2568 use all hosts available for es for connection pooling, node status visibility and retries #3747

Conversation

sriram-atlan commented Nov 15, 2024

Currently ES nodes does round robin with K8 service. Instead we want to use ES loadbalancing. This change requires altas config update too to go in hand.

Type of change

Related issues

Checklists

Development

Security

Code review