Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resiliency of performance calculation requests #125

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tsunyoku
Copy link

Server has had some instability recently and made me realise that this isn't even really trying to be resilient.. I've added tenacity retry logic onto the performance usecases and opted to use raise_for_status instead of returning zeros on a failure - this seemed like odd and unintuitive behavior to me, even with the logging

@cmyui
Copy link
Member

cmyui commented Jul 16, 2024

hmm, i think this (& the previous code) lacks consideration that the osu client retries score submission, which is a part of the resiliency system.. i think setting pp=0 without a strategy to calculate these is worse than hard failing tbh

extra={"status": response.status_code},
)
return [(0.0, 0.0)] * len(scores)
response.raise_for_status()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i agree with moving to response.raise_for_status() to hard fail requests rather than setting 0

@tsunyoku
Copy link
Author

hmm, i think this (& the previous code) lacks consideration that the osu client retries score submission, which is a part of the resiliency system.. i think setting pp=0 without a strategy to calculate these is worse than hard failing tbh

my concern with this is that it has the implication that if performance-service never comes back up that the score will never submit. i think it would be better to have a score submitted with zero pp than no score

@cmyui
Copy link
Member

cmyui commented Jul 16, 2024

hmm, i think this (& the previous code) lacks consideration that the osu client retries score submission, which is a part of the resiliency system.. i think setting pp=0 without a strategy to calculate these is worse than hard failing tbh

my concern with this is that it has the implication that if performance-service never comes back up that the score will never submit. i think it would be better to have a score submitted with zero pp than no score

Yeah, ultimately a deadletter queue for performance calculations would be a nice solution here. Does the processor on performance service already support us building such a queue?

@tsunyoku
Copy link
Author

i would hope that logging the score id on failure would make it easier to create a strategy to recalculate scores in this case

but having an asynchronous score processing model like i've discussed before would solve all of these concerns i think, since we take separate ownership of calculating performance outside of the hot path (the actual score submission) so we can retry as much as we like (without relying on the client to retry) solong as we have the initial score data. lazer does this and i think it works well

@tsunyoku
Copy link
Author

Yeah, ultimately a deadletter queue for performance calculations would be a nice solution here. Does the processor on performance service already support us building such a queue?

not really, performance-service is intentionally detached from a concept of "this score has xpp" and rather taking in statistics and spitting out a value. i don't really know if i want to change that relationship either

@cmyui
Copy link
Member

cmyui commented Jul 16, 2024

Yea I think I agree that keeping it on score-service would be nice.

having an asynchronous score processing model like i've discussed before would solve all of these concerns i think, since we take separate ownership of calculating performance outside of the hot path (the actual score submission) so we can retry as much as we like

Yea imo this would likely look like a main perf calculation queue + a deadletter queue for failed requests (so that the main queue is not blocked by a single broken score); both on score service that depend on perf-service and the db

@tsunyoku
Copy link
Author

yep i agree, i think this is a "long term consideration" though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants