-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional PromQL operators to synthetic load #747
Conversation
ca2452f
to
872732a
Compare
Solves #705 |
872732a
to
74f8a46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I commented on a few, but they are all likely problemmatic.
I recommend you change tack to use metrics exposed by the fake webserver, since there are a lot of them and prombench can control them.
interval: 10s | ||
type: instant | ||
queries: | ||
- expr: sum(node_cpu_seconds_total)/sum(container_memory_rss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not terribly good as a test of operator performance, since it matches one series against one other series, both of which have no labels.
type: instant | ||
queries: | ||
- expr: sum(node_cpu_seconds_total)/sum(container_memory_rss) | ||
- expr: rate(node_cpu_seconds_total[5m]) * 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better, at 256 series when tested.
queries: | ||
- expr: sum(node_cpu_seconds_total)/sum(container_memory_rss) | ||
- expr: rate(node_cpu_seconds_total[5m]) * 5 | ||
- expr: sum(go_gc_heap_goal_bytes)/sum(loadgen_query_duration_seconds_created) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the same structural problem as the first one.
interval: 10s | ||
type: instant | ||
queries: | ||
- expr: node_cpu_seconds_total{mode="nice"} and node_cpu_seconds_total{namespace="default"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nodes don't have a namespace, so this is short-cutted to return nothing.
Actually i have tested this on one of the Pull Request where the prombench was running. |
Ok but my recommendation remains the same. |
fcab2db
to
a16c6db
Compare
Replaced Old metrics with the new ones
a16c6db
to
4991729
Compare
How many of those did you count? |
Hi @bboreham all the queries having value above 2000 , and codelabz metric is having 52000 different variations . |
updated metrics with some heavy count
c0c0f48
to
0a30d46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is getting better.
I am still interested to think about the cardinality you are expecting for each operator in each query.
At the end of the day there should be a balance across different kinds of load, so we can justify that prombench is a realistic test, and also we want the prometheus under load to be able to keep up.
type: instant | ||
queries: | ||
- expr: topk(2000, sum(rate(go_gc_duration_seconds_count[5m])) by (instance, job)) | ||
- expr: topk(10000, sum(codelab_api_request_duration_seconds_bucket) by (method,job)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
topk(10000
is not realistic; nobody is going to scroll down 10,000 lines of screen output to find something.
k
should be more like 10, or perhaps 100.
Also I don't think there are 10,000 combinations of method and job.
Also it is not valid to sum histogram buckets.
- expr: codelab_api_request_duration_seconds_bucket{method="GET"} or codelab_api_request_duration_seconds_bucket{method="POST"} | ||
- expr: codelab_api_request_duration_seconds_sum{status="200"} or codelab_api_request_duration_seconds_sum{status="500"} | ||
- expr: codelab_api_request_duration_seconds_bucket{status="200"} and codelab_api_request_duration_seconds_bucket{method="GET"} | ||
- expr: codelab_api_request_duration_seconds_count{method="POST"} and codelab_api_request_duration_seconds_count{status="500"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much point in doing multiple expressions that are essentially the same.
or
is different to and
, but after that you could have an /
, taking the ratio of errors to all requests for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized a couple of things since this comment: you have /
in "arithmetic operation" above, but this and
is never going to return anything because the labels on each side are different. We want the benchmark queries to make sense.
Slow down arithmetic_operation and logic_operator; take out a few queries to avoid overloading the server. Stop querying `_bucket` series directly; those should be used by `histogram_quantile` or similar. Use more realistic `k` parameters to `topk`. Signed-off-by: Bryan Boreham <[email protected]>
For balance, to retain about the same overall load on the server as before. Signed-off-by: Bryan Boreham <[email protected]>
I trimmed down the newly-added queries a bit:
I also trimmed down some pre-existing queries for balance, to retain about the same overall load on the server as before. |
This PR enhances the synthetic load generation by incorporating additional PromQL operators in the 6_loadgen.yaml file:
topk
function to test query performance with ranked results.