Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Scale-Up Improvements on Single Load Generation Host #505

Closed
4 tasks done
IanHoang opened this issue Apr 4, 2024 · 3 comments
Closed
4 tasks done

[META] Scale-Up Improvements on Single Load Generation Host #505

IanHoang opened this issue Apr 4, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request Meta

Comments

@IanHoang
Copy link
Collaborator

IanHoang commented Apr 4, 2024

Overview

Understanding the scalability of OpenSearch Benchmark's search clients is crucial for OSB's future development, as it will inform its usage patterns and drive future enhancements.

More details are laid out in this RFC. This meta issue pertains only to the first component of the tasks indicated in the RFC, investigation into the scaling performance of OSB on a single load generation host.

The tasks can be broken up into the following milestones:

Milestones

Efforts:
S = estimated 1 week
M = estimated 4 weeks
L = estimated 6+ weeks

Note: These milestones are scoped towards scaling up clients on a single load generation host (or node) within OSB. For milestones for scaling out OSB clients (or using 2+ load generation hosts or nodes), we'll need to develop an RFC for DWG as well as a separate list of milestones.

Milestone 1: Quantifying current limitations (M)
The OSB community is aware that there are limitations in terms of scaling clients within OSB, but is unsure of what those exact limitations are. A majority of the time, OSB install OSB on a single load generation host and specify a number of clients which provisions a certain number of threads. To test well how this works, a performance comparison between a cluster of nodes, each with OSB set to a single client, and a single node with OSB set to several clients will help uncover what those exact limitations are.

This will require setting up a testing apparatus. A few scripts can be created to expedite the performance testing and comparison process. These results should inform us if OSB is accurately emulating metrics as well as provide insight into which OSB components are causing these limitations.

Milestone 2: Identify the workarounds (S)
After understanding the limitations, we will determine if there are any quick workarounds that users can resort to to alleviate scaling limitations, while work progresses on long-term solutions.

Based on limitations we have discovered, we should look to modify or add quick changes to the way OSB determines the number of worker actors to use and how it divides the clients amongst its workers. Outside making changes to the codebase, we can publish a guide with some general rule of thumbs to help users avoid issues.

Milestone 3: Investigate bottlenecks (or causes of limitations) and overcome bottlenecks (M)
For the limitations discovered in milestone 1, we will need to investigate the bottlenecks in more depth and identify causes. Subsequently, we should identify and implement appropriate solutions on how to resolve such bottlenecks and remove limitations found in OSB.

Since OSB might have workarounds incorporated, we can spend effort investigating the bottlenecks. This will involve looking at specific components within OSB -- such as the worker coordinator actor and the worker(s) actors. By analyzing the actor-system, we should be able to come up with appropriate solutions and potential redesigns to resolve bottlenecks.

Milestone 4: Review (S)
After all the work has been done, we should summarize our findings and solutions and ensure that OSB has been appropriately updated to handle scaling better.

From what we've discovered and implemented, we should draft up subsequent action items that can be performed (i.e. should there be any future enhancements or redesigns?). Additionally, work can be commenced on investigating DWG, which allows scaling out beyond a single load generation host.

For more information on each milestone, see the task issues / child issues in the following section:

Child Issues

META Issue containing issues related to scaling in OSB:

#593

@IanHoang IanHoang added the enhancement New feature or request label Apr 4, 2024
@IanHoang IanHoang self-assigned this Apr 4, 2024
@IanHoang IanHoang added Meta and removed untriaged labels Apr 4, 2024
@IanHoang IanHoang changed the title [META] Distributed Workload Generation Analysis and Scale Testing [META] Scale Testing Analysis and Distributed Workload Generation Apr 12, 2024
@gkamat gkamat moved this from Todo to In Progress in Performance Roadmap May 15, 2024
@gkamat gkamat moved this from Backlog to In Progress in OpenSearch Engineering Effectiveness Jun 4, 2024
@IanHoang IanHoang changed the title [META] Scale Testing Analysis and Distributed Workload Generation [META] Scale Testing Analysis Jun 20, 2024
@IanHoang IanHoang changed the title [META] Scale Testing Analysis [META] Scaling Investigation Jul 23, 2024
@IanHoang IanHoang changed the title [META] Scaling Investigation [META] Scale Testing Analysis Jul 23, 2024
@IanHoang IanHoang changed the title [META] Scale Testing Analysis [META] Scaling Investigation Jul 23, 2024
@IanHoang
Copy link
Collaborator Author

IanHoang commented Jul 25, 2024

META issue containing issues related to scaling clients in OSB: #593

@getsaurabh02
Copy link
Member

getsaurabh02 commented Jul 26, 2024

For milestones for scaling out OSB clients, we'll need to develop an RFC for DWG as well as a separate list of milestones.

Can we expand on scaling out - are we saying multiple clients (distributed)

Also not sure how Milestone 1 and Milestone 3 are different?

@IanHoang
Copy link
Collaborator Author

After discussing with @gkamat and @getsaurabh02 last week, will perform a preliminary scaling investigation (see child issue scaling investigation #1) to get more data points for us to work with in the RFC and META task here.

@IanHoang IanHoang moved this from 🆕 New to 🏗 In progress in Engineering Effectiveness Board Jul 30, 2024
@IanHoang IanHoang changed the title [META] Scaling Investigation [META] Scale-Up Improvements on Single Load Generation Host Sep 10, 2024
@IanHoang IanHoang moved this from This Quarter to 🏗 In progress in OpenSearch Benchmark Roadmap Sep 26, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Performance Roadmap Nov 19, 2024
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Engineering Effectiveness Board Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Meta
Projects
Status: ✅ Done
Status: Done
Development

No branches or pull requests

2 participants