Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce node level circuit breaker settings for k-NN #2507

Closed
wants to merge 5 commits into from

Conversation

markwu-sde
Copy link
Contributor

Description

KNN plugin currently uses a cluster-wide circuit breaker. This doesn't work as well as it could when nodes have different memory capacities.

Solution

Added node-specific circuit breaker limits using node attributes. This opens up flexibility for heterogenous circuit breaker limits.

Usage

  1. Set node attribute in opensearch.yml:
node.attr.knn_cb_tier: "high"  # Examples: "high", "low", "standard"
  1. Configure limit for that tier:
PUT /_cluster/settings
{
  "persistent": {
    "knn.memory.circuit_breaker.limit.high": "75%"
  }
}

Implementation

  • Uses OpenSearch's groupSetting for dynamic limit configurations
  • Cache initially uses cluster-level default
  • Updates to node-specific limit when node attributes are available
  • Falls back to cluster default if no node-specific limit exists

Testing

Modified KNNCircuitBreakerIT integration tests to include node-level CB.

Used OSB benchmarking to run a modified low load test with/without the node level circuit breaker.

  • Added node.attr.knn_cb_tier = 'integ' to build.gradle
  • Ran 5-query search-only vector benchmark after indexing
  • Verified memory usage and circuit breaker behavior

Initial state (no limits set):

{
  "graph_memory_usage_percentage": 1.35,
  "graph_memory_usage": 110056
}

After setting node limit to 500000kb:

curl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent" : {
    "knn.memory.circuit_breaker.limit" : "500000kb"
  }
}
'

Results:

{
  "graph_memory_usage_percentage": 22.01,
  "graph_memory_usage": 110056,
  "circuit_breaker_triggered": false
}

Setting cluster limit to 5kb didn't affect node with specific limit which confirms proper override behavior.

Related Issues

Resolves #2263

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Introduce node level circuit breaker settings for k-NN
1 participant