Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Megaservices pending requests metric goes wrong after a while #1121

Open
2 of 6 tasks
eero-t opened this issue Jan 7, 2025 · 3 comments
Open
2 of 6 tasks

[Bug] Megaservices pending requests metric goes wrong after a while #1121

eero-t opened this issue Jan 7, 2025 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@eero-t
Copy link
Contributor

eero-t commented Jan 7, 2025

Priority

P3-Medium

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source

Deploy method

  • Docker compose
  • Docker
  • Kubernetes
  • Helm

Running nodes

Single Node

What's the version?

latest images / Git HEAD.

Description

Pending requests metric increments and decrements do not match, because after service has been seriously stressed for a while, metric value can be in hundreds when it should be zero, due to service idling without any pending requests. This makes the pending requests metric (worse than) useless.

Some kind of queue size metric is essential for service scaling, because with enough requests and multiple accelerated inferencing engines, also megaservice is bottleneck and needs to be scaled.

Notes

I did the PR adding this metric initially. At that point I did not notice this issue, possibly because service was not stressed long enough for the issue to show up.

I've tried (earlier today) adding pending count decrements in different places on ServiceOrchestrator, but those either did not have any effect on the metric, or made it completely bogus (decreased it way too much).

I also tried just changing the metric to incoming requests counter, but that does really not offer any significant value over (already provided) completed requests count. It does not tell how stressed the megaservices is, like the needed pending requests (= queue size) count would.

=> As I have other pressing OPEA work, I'm just documenting this here.

Reproduce steps

  • ChatQnA service, with multiple Gaudi accelerated TGI or vLLM backend instances
  • Stress ChatQnA for an hour by constantly sending requests from thousand parallel client connections
  • Stop sending new requests and wait until all pending ones have been processed
  • Query metrics end point for pending requests gauge: curl <svc:port>/metrics | grep pending

Raw log

No response

Attachments

No response

@eero-t eero-t added the bug Something isn't working label Jan 7, 2025
@eero-t
Copy link
Contributor Author

eero-t commented Jan 9, 2025

@Spycsh unless you have some good idea how this fix this issue for (my) merged #864 code, I think it needs to be reverted, and HTTP inprogress metrics used instead, despite their downsides.

@Spycsh
Copy link
Member

Spycsh commented Jan 10, 2025

@eero-t , think it may be affected by a race condition in calling logics even though k8s Gauge inc/dec are atomic,

    def __init__(self):
        self.lock = threading.Lock()

    def pending_update(self, increase: bool) -> None:
        with self.lock:
            if increase:
                self.request_pending.inc()
            else:
                self.request_pending.dec()

Do you think simply adding a lock here can solve this issue?

@eero-t
Copy link
Contributor Author

eero-t commented Jan 10, 2025

@eero-t , think it may be affected by a race condition in calling logics even though k8s Gauge inc/dec are atomic,
...
Do you think simply adding a lock here can solve this issue?

No, as the actual Metric value change is already protected by Mutex:

Order in which the increments and decrements are done between threads i.e. different requests, does not matter. It matters only for individual requests, and because increment is always before decrement in the request flow, everything is fine as long as all incremented requests actually do have matching decrement.

Btw. the amount of extra pending requests seems to match largest number of parallel request/connections done to the Megaservice. Does that ring any bells where the decrement might be missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants