You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pending requests metric increments and decrements do not match, because after service has been seriously stressed for a while, metric value can be in hundreds when it should be zero, due to service idling without any pending requests. This makes the pending requests metric (worse than) useless.
Some kind of queue size metric is essential for service scaling, because with enough requests and multiple accelerated inferencing engines, also megaservice is bottleneck and needs to be scaled.
Notes
I did the PR adding this metric initially. At that point I did not notice this issue, possibly because service was not stressed long enough for the issue to show up.
I've tried (earlier today) adding pending count decrements in different places on ServiceOrchestrator, but those either did not have any effect on the metric, or made it completely bogus (decreased it way too much).
I also tried just changing the metric to incoming requests counter, but that does really not offer any significant value over (already provided) completed requests count. It does not tell how stressed the megaservices is, like the needed pending requests (= queue size) count would.
=> As I have other pressing OPEA work, I'm just documenting this here.
Reproduce steps
ChatQnA service, with multiple Gaudi accelerated TGI or vLLM backend instances
Stress ChatQnA for an hour by constantly sending requests from thousand parallel client connections
Stop sending new requests and wait until all pending ones have been processed
Query metrics end point for pending requests gauge: curl <svc:port>/metrics | grep pending
Raw log
No response
Attachments
No response
The text was updated successfully, but these errors were encountered:
@Spycsh unless you have some good idea how this fix this issue for (my) merged #864 code, I think it needs to be reverted, and HTTP inprogress metrics used instead, despite their downsides.
@eero-t , think it may be affected by a race condition in calling logics even though k8s Gauge inc/dec are atomic,
...
Do you think simply adding a lock here can solve this issue?
No, as the actual Metric value change is already protected by Mutex:
Order in which the increments and decrements are done between threads i.e. different requests, does not matter. It matters only for individual requests, and because increment is always before decrement in the request flow, everything is fine as long as all incremented requests actually do have matching decrement.
Btw. the amount of extra pending requests seems to match largest number of parallel request/connections done to the Megaservice. Does that ring any bells where the decrement might be missing?
Priority
P3-Medium
OS type
Ubuntu
Hardware type
Gaudi2
Installation method
Deploy method
Running nodes
Single Node
What's the version?
latest
images / Git HEAD.Description
Pending requests metric increments and decrements do not match, because after service has been seriously stressed for a while, metric value can be in hundreds when it should be zero, due to service idling without any pending requests. This makes the pending requests metric (worse than) useless.
Some kind of queue size metric is essential for service scaling, because with enough requests and multiple accelerated inferencing engines, also megaservice is bottleneck and needs to be scaled.
Notes
I did the PR adding this metric initially. At that point I did not notice this issue, possibly because service was not stressed long enough for the issue to show up.
I've tried (earlier today) adding pending count decrements in different places on
ServiceOrchestrator
, but those either did not have any effect on the metric, or made it completely bogus (decreased it way too much).I also tried just changing the metric to incoming requests counter, but that does really not offer any significant value over (already provided) completed requests count. It does not tell how stressed the megaservices is, like the needed pending requests (= queue size) count would.
=> As I have other pressing OPEA work, I'm just documenting this here.
Reproduce steps
curl <svc:port>/metrics | grep pending
Raw log
No response
Attachments
No response
The text was updated successfully, but these errors were encountered: