-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Improving bulk actions for more than 10k agents #134565
Conversation
@elasticsearch merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
a736e53
to
ea9bfe5
Compare
Pinging @elastic/fleet (Team:Fleet) |
@elasticmachine merge upstream |
I've come across this issue once before when trying to action >10k agents, it was coming when trying to update that many documents in elastic at once.
|
const result: BulkActionResult = { | ||
let results; | ||
|
||
if (!skipSuccess) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omitting successful agents from result to avoid hitting HTTP response limit (currently only for more than 10k actions)
Test results on 8.3 branch (8.4 doesn't work): ESS instance:
Issue with Fleet Server:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes after review LGTM. Really great set of performance and consistency improvements here. Thank you for all your work on this!
@@ -65,6 +65,7 @@ export const postBulkAgentsUnenrollHandler: RequestHandler< | |||
...agentOptions, | |||
revoke: request.body?.revoke, | |||
force: request.body?.force, | |||
batchSize: request.body?.batchSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clarifying this. I understand the implementation here much better now 👍
💚 Build SucceededMetrics [docs]Public APIs missing comments
History
To update your PR or re-run it, just comment with: |
* changed getAllAgentsByKuery to query all agents with pit and search_after * added internal api to test pit query * changed reassign to work on batches of 10k * unenroll in batches * upgrade in batches * fixed upgrade * added tests * cleanup * revert changes in getAllAgentsByKuery * renamed perPage to batchSize in bulk actions * fixed test * try catch around close pit Co-authored-by: Kibana Machine <[email protected]> (cherry picked from commit 2732f26)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…35104) * changed getAllAgentsByKuery to query all agents with pit and search_after * added internal api to test pit query * changed reassign to work on batches of 10k * unenroll in batches * upgrade in batches * fixed upgrade * added tests * cleanup * revert changes in getAllAgentsByKuery * renamed perPage to batchSize in bulk actions * fixed test * try catch around close pit Co-authored-by: Kibana Machine <[email protected]> (cherry picked from commit 2732f26) Co-authored-by: Julia Bardi <[email protected]>
This change documents the ability to leverage a batchSize body parameter for the bulk_reassign_agents_request which was introduced in the following PR: elastic/kibana#134565 This will help align the documentation with the Kibana API docs: https://www.elastic.co/guide/en/fleet/current/fleet-apis.html#bulkReassignAgents
…er (#1465) This change documents the ability to leverage a batchSize body parameter for the bulk_reassign_agents_request which was introduced in the following PR: elastic/kibana#134565 This will help align the documentation with the Kibana API docs: https://www.elastic.co/guide/en/fleet/current/fleet-apis.html#bulkReassignAgents
…er (#1465) This change documents the ability to leverage a batchSize body parameter for the bulk_reassign_agents_request which was introduced in the following PR: elastic/kibana#134565 This will help align the documentation with the Kibana API docs: https://www.elastic.co/guide/en/fleet/current/fleet-apis.html#bulkReassignAgents (cherry picked from commit 421a653)
…er (#1465) This change documents the ability to leverage a batchSize body parameter for the bulk_reassign_agents_request which was introduced in the following PR: elastic/kibana#134565 This will help align the documentation with the Kibana API docs: https://www.elastic.co/guide/en/fleet/current/fleet-apis.html#bulkReassignAgents (cherry picked from commit 421a653)
…er (#1465) (#1476) This change documents the ability to leverage a batchSize body parameter for the bulk_reassign_agents_request which was introduced in the following PR: elastic/kibana#134565 This will help align the documentation with the Kibana API docs: https://www.elastic.co/guide/en/fleet/current/fleet-apis.html#bulkReassignAgents (cherry picked from commit 421a653) Co-authored-by: Austin Smith <[email protected]> Co-authored-by: David Kilfoyle <[email protected]>
…er (#1465) (#1477) This change documents the ability to leverage a batchSize body parameter for the bulk_reassign_agents_request which was introduced in the following PR: elastic/kibana#134565 This will help align the documentation with the Kibana API docs: https://www.elastic.co/guide/en/fleet/current/fleet-apis.html#bulkReassignAgents (cherry picked from commit 421a653) Co-authored-by: Austin Smith <[email protected]> Co-authored-by: David Kilfoyle <[email protected]>
Summary
Improving bulk actions for more than 10k agents #133388
Changed
getAllAgentsByKuery
(used by bulk actions only) to query all agents with point in time query andsearch_after
for datasets bigger than 10k.Tested locally by changing
SO_SEARCH_LIMIT
to 5 and bulk actioning more than 10 agents by selecting all at once (with 5 page size on UI)Pending work:
Find a way to write api integration test without having to put more than 10k agents to ES. Could be an internal API endpoint exposed which takes page size as a parameter
perPage
value than 10k, added integration test to verify logic. The response returns the realtotal
value, and the first 10 agents initems
.Test with actually more than 10k agents enrolled with horde
Change the logic to perform the actions in batches rather than all agents at once in memory - we might hit a memory limit if we try to do it at once.
Checklist