Skip to content

Commit

Permalink
Update math formatting for randomizing queries and the formatting gui…
Browse files Browse the repository at this point in the history
…de (opensearch-project#9047)

Signed-off-by: Fanit Kolchina <[email protected]>
  • Loading branch information
kolchfa-aws authored Jan 10, 2025
1 parent 3fdb3bc commit 21ac587
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 6 deletions.
6 changes: 6 additions & 0 deletions FORMATTING_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,12 @@ Some Markdown paragraph. Here's a formula:
And back to Markdown.
```

Alternatively, you can use double dollar signs (`$$`) for both display and inline math directly in Markdown:

```
The probability of selecting pair $$i$$ is proportional to $$1 \over i^\alpha$$.
```

## Tables

Markdown table columns are automatically sized, and there is no need to specify a different number of dashes in the formatting.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
---
layout: default
title: Running randomized workloads
title: Randomizing queries
nav_order: 160
parent: Optimizing benchmarks
grand_parent: User guide
has_math: true
---

# Randomizing queries
Expand All @@ -29,15 +30,13 @@ For example, changing `"gte"` and `"lt"` in the following `nyc_taxis` operation
}
```


You can't completely randomize the values because the cache would not get any hits. To get cache hits, the cache must sometimes encounter the same values. To account for the same values while randomizing, OpenSearch Benchmark generates a number `N` of value pairs for each randomized operation at the beginning of the benchmark. OpenSearch Benchmark stores these values in a saved list where each pair is assigned an index from `1` to `N`.
You can't completely randomize the values because the cache would not get any hits. To get cache hits, the cache must sometimes encounter the same values. To account for the same values while randomizing, OpenSearch Benchmark generates a number $$N$$ of value pairs for each randomized operation at the beginning of the benchmark. OpenSearch Benchmark stores these values in a saved list where each pair is assigned an index from $$1$$ to $$N$$.

Every time OpenSearch sends a query, OpenSearch Benchmark decides whether to use a pair of values from this saved list in the query. It does this a configurable fraction of the time, called _repeat frequency_ (`rf`). If OpenSearch has encountered the value pair before, this might cause a cache hit. For example, if `rf` = 0.7, the cache hit ratio could be up to 70%. This ratio could cause a hit, depending on the benchmark's duration and cache size.

OpenSearch Benchmark selects saved value pairs using the Zipf probability distribution, where the probability of selecting pair `i` is proportional to `1/i^α`. In this formula, `i` represents the index of the saved value pair, and `α` controls how concentrated the distribution is. This distribution reflects usage patterns observed in real caches. Pairs with lower `i` values (closer to `1`) are selected more frequently, while pairs with higher `i` values (closer to `N`) are selected less often.

Otherwise, the other `1-rf` fraction of the time, a new random pair of values is generated. Because OpenSearch Benchmark has not encountered these value pairs before, the pairs should miss the cache.
OpenSearch Benchmark selects saved value pairs using the Zipf probability distribution, where the probability of selecting pair $$i$$ is proportional to $$1 \over i^\alpha$$. In this formula, $$i$$ represents the index of the saved value pair, and $$\alpha$$ controls how concentrated the distribution is. This distribution reflects usage patterns observed in real caches. Pairs with lower $$i$$ values (closer to $$1$$) are selected more frequently, while pairs with higher $$i$$ values (closer to $$N$$) are selected less often.

The other $$1 -$$ `rf` fraction of the time, a new random pair of values is generated. Because OpenSearch Benchmark has not encountered these value pairs before, the pairs should miss the cache.

## Usage

Expand Down

0 comments on commit 21ac587

Please sign in to comment.