-
Notifications
You must be signed in to change notification settings - Fork 99
Conversation
Co-authored-by: Yamil Asusta <[email protected]>
if r.Hits > int64(b.Remaining) { | ||
metricOverLimitCounter.Add(1) | ||
b.Remaining = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed because the owner receives collapsed/accumulated hits, so if the remainder is 50, drip/leak brings it to 75 and the peers reported a total of 200 hits, it's fair to say that the quota has been consumed AND that the tokens be taken away given that drip was applied and we don't want to give more tokens away than we should (even if you were dripping exclusively on the owner).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great catch! However, this changes the semantics of asking for more than what is available. If we went forward with this change, any request that asks for more than is available will gobble up the remainder without actually using them.
For example: If my email limit is 1,000 and I've already sent 500 emails. Then I request to send 1,000 more emails, I would expect the 1,000 email request to be rejected as "over the limit". However, subsequent requests for 500 emails or less should be allowed, as I have 500 emails remaining within my rate limit window.
I think I have an idea of how to solve this. I'm adding some tests to cover this scenario and will have a PR soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had just merged in this behavior as a new behavior type DRAIN_OVER_LIMIT
. I faced the same usability issue, but I did not want to break legacy applications that may have depended on old logic.
#209
rl, ok := item.Value.(*RateLimitResp) | ||
if ok { | ||
return rl, nil | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, for LEAKY_BUCKET the peer doesn't really know what tokens have leaked since we last checked.... I guess we could have the peer calculate the leak... but that sounds icky... Maybe someone can convince me otherwise?
@thrawn01 there is more to this story we'd like to discuss.
This change (left in comments so its easier to discuss, we can remove or uncomment later) basically applies drip on both peers and owner. Thus the owner has the true count while peers are constantly changing.
This has benefits and drawbacks.
Benefits:
- If owner takes too long to broadcast (network failure, saturation, etc) the peers can eventually converge on OVER_LIMIT and the system can protect itself.
- If owner takes too long to broadcast, peers can start letting traffic through instead of completely closing the circuit (stuck on OVER_LIMIT until TTL of the cached response).
- If owner takes too long to broadcast, peers can start letting traffic through slowly given that the limit won't be immediately reset once the TTL of the cached response happens.
- If peer(s) suffer from network issues where their hits can't be flushed, they can operate individually until the converge. If high rate, they will converge at rate limiting, otherwise they will constantly be flapping.
Basically we want to reduce the single point of failure the owner introduces by being the only one that can lower the remaining count. Specially if the Duration is 1s (or less) with a high leak/drip rate.
This comes with a particular drawback. It makes it so that you give away more tokens that you'd normally would. For example, if you set an RPS of 100 and you have 20 instances, that means that you will have 20*100 RPS in the case the owner is not fast enough. The more nodes you add, the more permissive it is because it's constantly dripping on each individual node. Thus you have to be mindful of your underlying capacity/fleet of gubernator. The smaller the fleet, the more strict it will be, the bigger, the more permissive it will be in high drip/leak scenarios.
Closing in favor of: #219 |
Description
@elbuo8 will add more details later :)
References
Closes #208