Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Support bulk operation for more than 10k agents #133388

Closed
nchaulet opened this issue Jun 2, 2022 · 10 comments
Closed

[Fleet] Support bulk operation for more than 10k agents #133388

nchaulet opened this issue Jun 2, 2022 · 10 comments
Assignees
Labels
enhancement New value added to drive a business result Project:FleetScaling QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@nchaulet
Copy link
Member

nchaulet commented Jun 2, 2022

Description

We currently use Elasticsearch search API to retrieve agents for bulk operation, we should change that to allow to support bulk operation (upgrade, unenroll) for more than 10k agents.

We can probably use the Elasticsearch scroll API to do this, and create multiple .fleet-actions document with the same action id.

@nchaulet nchaulet added enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team labels Jun 2, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@jen-huang
Copy link
Contributor

Good details in this related bug report as well: #133548

@joshdover
Copy link
Contributor

Let's make sure to include reassigning policies as well and that we are using a single action for multiple agents rather than one action per agent

@joshdover
Copy link
Contributor

FYI let’s make sure we are not using Elasticsearch scroll API for this or for #91562. Instead, we should be using the point in time API in conjunction with the search_after parameter. The scroll API does not preserve sorting across pagination requests which can result in duplicate or missing entries as the index data changes.

@juliaElastic
Copy link
Contributor

@joshdover I tried out point in time search, and it seems to work with page and perPage options.

I think using search_after is not the best here, because on UI it is allowed to go to a random page e.g. page 5, however for search_after, you need to pass the last hit of the previous page.

image

@joshdover
Copy link
Contributor

@juliaElastic Makes sense, PIT + page options should be fine. My main point was to avoid using the scroll API.

@juliaElastic
Copy link
Contributor

juliaElastic commented Jun 15, 2022

@joshdover I noticed that the PIT request doesn't show latest agents enrolled since the opening of the PIT request, this means when keeping the Agent list UI open, newly added agents are not showing up, only when navigating away and coming back.
Do we want to handle this scenario?

EDIT: I managed to test with 10k+ agents, and I still get the same error as before when using PIT with page and pageSize.

search_phase_execution_exception: [illegal_argument_exception] Reason: Result window is too large, from + size must be less than or equal to: [10000] but was [10020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

@joshdover
Copy link
Contributor

Hmm so maybe we shouldn't be using PIT for the pagination in the UI for this issue, only while iterating through a list on bulk actions on the backend when "select everything on all pages" is selected for a bulk action. In that case, we should be able to use search_after to avoid this and shouldn't have the problem that new agents weren't included since we'll iterate through the whole list quite quickly.

Pagination in the UI is separate but needs to be solved in #91562

@ablnk
Copy link

ablnk commented Jun 24, 2022

Hey @juliaElastic @joshdover I noticed a small UI issue, would you be mind to check it out? #135120

@amolnater-qasource
Copy link

Hi @juliaElastic @jen-huang
We have revalidated this feature on latest 8.4 Snapshot and found it working fine.
We are successfully able to perform below actions on more than 10k agents through bulk actions:

  • Assign New Policy
  • Add/remove tags
  • Unenroll Agents
  • Force Unenroll Agents
  • Upgrade Agents
  • Schedule Upgrade

Build details:
VERSION: 8.4.0 Snapshot
BUILD: 54194
COMMIT: f94d5ff

Screenshots:
1
2
3
4

Please let us know if we are missing anything here.
Thanks

@jen-huang jen-huang added QA:Validated Issue has been validated by QA and removed QA:Needs Validation Issue needs to be validated by QA labels Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Project:FleetScaling QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

7 participants