Add high throughput integration test #5655

rdettai · 2025-01-28T11:10:17Z

Description

This PR is a reuse of the tests and docs proposed in #5644, which itself is not necessary anymore after the the status code was fixed to be 429 when shards need scaling up (#5651)

How was this PR tested?

Describe how you tested this PR.

rdettai · 2025-01-28T11:15:18Z

quickwit/quickwit-rest-client/src/models.rs

+    pub fn merge(self, other: RestIngestResponse) -> Self {
+        Self {
+            num_docs_for_processing: self.num_docs_for_processing + other.num_docs_for_processing,
+            num_ingested_docs: apply_op(self.num_ingested_docs, other.num_ingested_docs, |a, b| {
+                a + b
+            }),
+            num_rejected_docs: apply_op(self.num_rejected_docs, other.num_rejected_docs, |a, b| {
+                a + b
+            }),
+            parse_failures: apply_op(self.parse_failures, other.parse_failures, |a, b| {
+                a.into_iter().chain(b).collect()
+            }),
+            num_too_many_requests: self.num_too_many_requests,
+        }
+    }


I moved this back here as it makes more sense than in the API model because accumulating responses is quite specific to the rest client.

esatterwhite · 2025-01-30T12:34:34Z

quickwit/quickwit-integration-tests/src/tests/ingest_v2_tests.rs

+        .ingest(
+            index_id,
+            IngestSource::Str(body),
+            Some(5_000_000),


Casually observing.

Our batches tend to have 200 - 1000 items during bulk request and target 1 - 10 different indexes. The higher end is about 3mb. Here the typical rate of ingest is about 900mb/s

Our retry work load is capped to 500 items per batch and sits around 1mb. These would appear to touch 1 index. We are trying to determine if they are all the same index

I'm playing with the batch size in this test because I had a surprising 500 (internal timeout) error. We want the system to behave consistently regardless of the batch size (obviously with very small batches adding a bit of overhead).

What do we consider very small?

One overhead of the top of my head is the http frame overhead. If you have more http headers than document payload, that's a bit of unnecessary extra work.

the 1MB granularity should be pretty good (negligeable framing overhead in network exchanges but small enough to be nicely balanced between shards)

rdettai · 2025-01-30T16:24:46Z

TODO BEFORE MERGING: figure out why the high throughput test fails with a big payload but not with smaller ones

rdettai changed the base branch from retry-no-shard to main January 28, 2025 11:10

rdettai commented Jan 28, 2025

View reviewed changes

rdettai force-pushed the test-client-retries branch 3 times, most recently from 2f00367 to de8f2a1 Compare January 30, 2025 10:25

rdettai added 2 commits January 30, 2025 12:04

Add high throughput integration test

02b4ead

Try smaller batches

aa98399

rdettai force-pushed the test-client-retries branch from de8f2a1 to aa98399 Compare January 30, 2025 11:05

esatterwhite reviewed Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add high throughput integration test #5655

Add high throughput integration test #5655

rdettai commented Jan 28, 2025 •

edited

Loading

rdettai Jan 28, 2025

esatterwhite Jan 30, 2025

rdettai Jan 30, 2025

esatterwhite Jan 30, 2025

rdettai Jan 30, 2025

rdettai Jan 30, 2025

rdettai commented Jan 30, 2025

Add high throughput integration test #5655

Are you sure you want to change the base?

Add high throughput integration test #5655

Conversation

rdettai commented Jan 28, 2025 • edited Loading

Description

How was this PR tested?

rdettai Jan 28, 2025

Choose a reason for hiding this comment

esatterwhite Jan 30, 2025

Choose a reason for hiding this comment

rdettai Jan 30, 2025

Choose a reason for hiding this comment

esatterwhite Jan 30, 2025

Choose a reason for hiding this comment

rdettai Jan 30, 2025

Choose a reason for hiding this comment

rdettai Jan 30, 2025

Choose a reason for hiding this comment

rdettai commented Jan 30, 2025

rdettai commented Jan 28, 2025 •

edited

Loading