[Scaling Investigation] Validate Client Simulation Accuracy #557

IanHoang · 2024-06-20T19:58:08Z

Experiment 1:

This is related to the scale testing RFC. For more details, see the RFC here.

To see other experiments in this analysis, see the META issue.

In this experiment we want to address the following questions:

Do search clients in OSB properly simulate actual clients in a client-server model?
For situations where workers have more than one search client, does OSB still properly simulate clients in a client-server model?

During a test, the Worker Coordinator Actor provisions and coordinates a number of Worker Actors that are responsible for driving requests to the SUT. These Worker Actors are allocated a number of clients to perform steps (also known as tasks or operations in a workload). It’s worth mentioning that the number of Worker Actors is determined by the number of CPU cores or vCPUs that the host running OSB has.

The two tables listed below (Autoscaling Group with OpenSearch Benchmark set to a single client on each EC2 instance and Load Generation Host with OpenSearch Benchmark) are two series of experiments to determine if a single load generation host can simulate the same performance as set of instances that all act as a single independent clients.

To reduce discrepancies, we ensure that experiments in Table 2: Load Generation Host with OpenSearch Benchmark has no more than 1 client assigned per worker actor. This can be seen with how the number of clients is always equal to or less than the number of vCPUs. This would match how each client or instance in the ASG in Table 1: Autoscaling Group with OpenSearch Benchmark always will use only one vCPU (even though they each will have 2 vCPUs).

Table 1: Determine Performance of an Autoscaling Group of N instances of OpenSearch Benchmark where `search_clients = 1`

Autoscaling Group with OpenSearch Benchmark	Clients	Instance Type	Instance Count	vCPUs	Memory (GB)
Round 1	8	c5.large	8	16	32
Round 2	16	c5.large	16	32	64
Round 3	32	c5.large	32	64	128

In the table above, the gradual increase in instance count of the same instance type implies that there is a gradual progression of search clients. Each instance will be running OSB with one search client. When all the instances have finished running OSB, we can use a script to aggregate the results for service time across all instances in the ASG.

Table 2: Determine Performance of a Single Load Generation Host with OpenSearch Benchmark where `search_clients = N`

LG Hosts with OpenSearch Benchmark	Simulated Clients (search_clients:N)	Instance Type	Instance Count	vCPUs	Memory (GB)
Round 1	8	c5.2xlarge	1	8	16
Round 2	16	c5.4xlarge	1	16	32
Round 3	32	c5.9xlarge	1	36	72

In the table above, there will only be a single load generation host.

After running experiments from Table 1 & 2, we should perform a comparison.

Table 3: Load Generation Host with OpenSearch Benchmark where `search_clients = N` & More Clients Per Worker

LG Hosts with OpenSearch Benchmark	Simulated Clients (search_clients:N)	Instance Type	Instance Count	vCPUs	Memory (GB)	Clients Per Worker Actor
Round 1	8	c5.large	1	2	16	4
Round 2	16	c5.large	1	2	16	8
Round 2	32	c5.large	1	2	16	16

Knowing how worker actors can be allocated more than one client, we should also rerun the load generation host with OpenSearch Benchmark but in a way where more clients are allocated to a worker actor, as seen in Table 3: Load Generation Host with OpenSearch Benchmark and More Clients Per Worker. This will confirm if we adding more clients to a worker (running with a smaller instance type where there are less CPU cores) can simulate the same performance where one client is assigned to one worker. In round 1 in the table above, we should expect to see two workers (since there are two vCPUs) with 4 clients each. In round 2 in the table above, we should see two workers with 8 clients each. We can compare them with the results from Table 2 (where we tested the same configurations but kept 1 client per worker). If we see no degradation here, scaling investigation 2 should offer stress the load generation host and help us determine what the max clients allowed per worker is.

Term Query

  {
      "name": "term",
      "operation-type": "search",
      "index": "{{index_name | default('big5')}}",
      "request-timeout": 7200,
      "body": {
      "query": {
          "term": {
          "log.file.path": {
              "value": "/var/log/messages/fuschiashoulder"
          }
          }
      }
      }
  },

The term query above is considered a fast query in the Big5 workload and can be used for our experiment.

Metrics to Analyze

With each round of tests, we’ll be comparing the metrics — such as query throughput and service time — seen in both clients from the ASG and the load generation host. We’ll also be monitoring the resource utilization in the ASGs, load generation host, and the system-under-test. If the system-under-test shows signs of resource bottlenecks, we will scale it out and rerun the numbers to ensure that the test results are not skewed.

Why are we not using latency?

OSB’s definition of latency is slightly different from the colloquial definition of latency. In OSB, when a user specifies a target throughput to achieve with the target-throughput parameter, latency is the service time plus the time that the request spends waiting in the queue. When OSB’s parameter target-throughput is not set, service time and latency are equivalent. The design of this parameter is for users who want to achieve a specific target-throughput, which might be for different reasons such as simulating target-throughput seen in their production clusters. Based on these reasons, for these experiments, we will not be setting target-throughput and the clients (in ASG and OSB) will send the queries as fast as possible. Therefore, we will be primarily focusing on service time as it should be equivalent to latency. For more information, see this article from OSB’s documentation.

The text was updated successfully, but these errors were encountered:

IanHoang · 2024-06-24T16:53:58Z

Set Up Experiment Prerequisites

Set up large OpenSearch cluster: 20 Data Nodes (r5.large), 3 Master Nodes (c5.2xlarge)
Set up Auto Scaling Group with OSB (Table 1)
- Set up AMI with OSB and Big5 installed
- Set up launch template (ensure that commands have tags to denote that these were run from the Auto Scaling Group).
- Test out with test cluster
Set up single load generation host (Table 2 and 3)
Set up metric data store (MDS)
Create script that aggregates results from MDS and produces summary of performance across all instances (or "clients") in Auto Scaling Group

IanHoang · 2024-09-10T21:49:35Z

Scaling investigation scripts created a few weeks back. They can be found here: https://github.com/IanHoang/scaling-investigation

IanHoang added enhancement New feature or request untriaged labels Jun 20, 2024

IanHoang mentioned this issue Jun 20, 2024

[META] Scale-Up Improvements on Single Load Generation Host #505

Closed

4 tasks

IanHoang changed the title ~~[Scale Testing] Experiment 1~~ [Scale Testing] Experiment 1: Validate Simulations Jun 20, 2024

IanHoang added Child Issue and removed untriaged labels Jun 20, 2024

IanHoang added this to Search Project Board Jun 20, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Jun 20, 2024

IanHoang added this to OpenSearch Engineering Effectiveness Jun 20, 2024

github-project-automation bot moved this to Backlog in OpenSearch Engineering Effectiveness Jun 20, 2024

IanHoang changed the title ~~[Scale Testing] Experiment 1: Validate Simulations~~ [Scale Testing] Experiment 1: Validate Simulation Accuracy Jun 27, 2024

IanHoang changed the title ~~[Scale Testing] Experiment 1: Validate Simulation Accuracy~~ [Scaling Investigation] Experiment 1: Validate Simulation Accuracy Jul 24, 2024

IanHoang changed the title ~~[Scaling Investigation] Experiment 1: Validate Simulation Accuracy~~ [Scaling Investigation] Validate Simulation Accuracy Jul 24, 2024

IanHoang changed the title ~~[Scaling Investigation] Validate Simulation Accuracy~~ [Scaling Investigation] Validate Client Simulation Accuracy Jul 24, 2024

getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scaling Investigation] Validate Client Simulation Accuracy #557

[Scaling Investigation] Validate Client Simulation Accuracy #557

IanHoang commented Jun 20, 2024 •

edited

Loading

IanHoang commented Jun 24, 2024 •

edited

Loading

IanHoang commented Sep 10, 2024

[Scaling Investigation] Validate Client Simulation Accuracy #557

[Scaling Investigation] Validate Client Simulation Accuracy #557

Comments

IanHoang commented Jun 20, 2024 • edited Loading

Experiment 1:

Table 1: Determine Performance of an Autoscaling Group of N instances of OpenSearch Benchmark where search_clients = 1

Table 2: Determine Performance of a Single Load Generation Host with OpenSearch Benchmark where search_clients = N

Table 3: Load Generation Host with OpenSearch Benchmark where search_clients = N & More Clients Per Worker

Term Query

Metrics to Analyze

Why are we not using latency?

IanHoang commented Jun 24, 2024 • edited Loading

Set Up Experiment Prerequisites

IanHoang commented Sep 10, 2024

IanHoang commented Jun 20, 2024 •

edited

Loading

Table 1: Determine Performance of an Autoscaling Group of N instances of OpenSearch Benchmark where `search_clients = 1`

Table 2: Determine Performance of a Single Load Generation Host with OpenSearch Benchmark where `search_clients = N`

Table 3: Load Generation Host with OpenSearch Benchmark where `search_clients = N` & More Clients Per Worker

IanHoang commented Jun 24, 2024 •

edited

Loading