-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Backoff based on HTTP statuses #537
Conversation
|
||
if (process.env.NODE_ENV !== 'test' && typeof this.timer.unref === 'function') { | ||
this.timer.unref(); | ||
} | ||
} | ||
|
||
start(): void { | ||
if (typeof this.metricsInterval === 'number' && this.metricsInterval > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typescript compiler stated that this was no longer necessary
src/repository/index.ts
Outdated
@@ -89,6 +95,7 @@ export default class Repository extends EventEmitter implements EventEmitter { | |||
instanceId, | |||
projectName, | |||
refreshInterval = 15_000, | |||
maxRefreshInterval = 120_000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expose this in the constructor, it gives consumers the ability to just turn off our backoff. Do we want that? Tempted to use a constant here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's removed, and replaced by putting a maximum backoff of 10 * interval
@@ -812,18 +812,16 @@ acorn@^8.9.0: | |||
|
|||
agent-base@6, agent-base@^6.0.2: | |||
version "6.0.2" | |||
resolved "https://registry.npmjs.org/agent-base/-/agent-base-6.0.2.tgz" | |||
resolved "https://registry.yarnpkg.com/agent-base/-/agent-base-6.0.2.tgz#49fff58577cfee3f37176feab4c22e00f86d7f77" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The yarn lock update needed or just boyscouting?
Neither is an issue, just curious for context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happened automatically when I ran yarn install.
this.failures = max(this.failures - 1, 0); | ||
return this.nextFetch(); | ||
} | ||
|
||
async fetch(): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetch
is now just short of 100 lines. Might be worth breaking this down into smaller methods at this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. Something like what we did for metrics. I'll have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it does what it says on the tin, tests seem to cover it nicely too
Suggested some small things but I'll leave the choices up to you
9f9332d
to
56d5908
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, will continue reviewing
src/metrics.ts
Outdated
} | ||
|
||
getInterval(): number { | ||
if(this.metricsInterval === 0 || this.failures === -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should failures
be initialized to -1? I can see this working but it makes it hard to reason about when we're checking against -1 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failures start at 0, if we initialize it to -1 we'd never actually try to post metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since you're asking, I guess it isn't as clear as I thought it'd be. extract to a method called stopPosting()
instead?
src/metrics.ts
Outdated
if (!res.ok) { | ||
if (res.status === 404 || res.status === 403 || res.status == 401) { | ||
this.unrecoverableError(url, res.status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really unrecoverable? If our API is down for a small period of time you may get a 404 response and it may recover later. Or you may get a 403 because your token does not have access to a resource, and then some admin might grant that token permissions. If we make this unrecoverable, they will need a restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember from the quick discovery/discussion session we had that we thought maybe not stop polling just treat it as a backoff-response, for the reasons @gastonfournier mentions here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 404:
- if your api is down, you should be getting 500 (service error) or a type of 502 or 504 (gateway timeouts), not 404, at least from our hosted instances. I guess there are reverse proxies out there that returns 404 if the service they're proxying is not available.
For 401/403:
Currently, you can't give tokens access after creation, so I'd be very surprised if you went from 403 to 200. You either have a client token or not. A client token which gets new access would simply see more features returned, not suddenly move from a FORBIDDEN to an OK response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doing what the Java SDK did instead, immediately log at error level that the API most likely is incorrectly configured and then back off to our maximum backoff (10 * interval), could also be an answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense! I'm still a bit worried we stop posting and we will not notice what's wrong. If we backoff might mean we get a request every now and then but not hammering our servers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for metrics is ok to be more aggressive and stop posting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed the method to configurationError
after discussion with Gaston
src/metrics.ts
Outdated
@@ -170,6 +190,19 @@ export default class Metrics extends EventEmitter { | |||
return true; | |||
} | |||
|
|||
unrecoverableError(url: string, statusCode: number) { | |||
this.emit(UnleashEvents.Warn, `${url} returning ${statusCode}, stopping metrics`); | |||
this.failures = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.failures = -1; | |
this.metricsInterval = 0; |
wouldn't this be enough? Just to avoid dealing with failures = -1 which is a bit confusing IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the benefit in having a value indicate unrecoverable error, but I think it should in that case be a separate variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, we're calling an explicit this.stop()
here already, so dropping the -1 seems fair. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I noticed the implementations are a little bit different between the metrics and the features pollings. Is that intentional?
Yes, the metrics is a POST action, whereas the polling is a GET. We're also more OK with losing metrics than not being able to get latest feature configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed some approvals got stale after recent changes, so here's my approval in case it unblocks you 👍
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [unleash-client](https://togithub.com/Unleash/unleash-client-node) | [`5.0.0` -> `5.3.0`](https://renovatebot.com/diffs/npm/unleash-client/5.0.0/5.3.0) | [![age](https://developer.mend.io/api/mc/badges/age/npm/unleash-client/5.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/unleash-client/5.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/unleash-client/5.0.0/5.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/unleash-client/5.0.0/5.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>Unleash/unleash-client-node (unleash-client)</summary> ### [`v5.3.0`](https://togithub.com/Unleash/unleash-client-node/releases/tag/v5.3.0) [Compare Source](https://togithub.com/Unleash/unleash-client-node/compare/v5.2.0...v5.3.0) #### What's Changed - fix: backward compatibility by [@​gastonfournier](https://togithub.com/gastonfournier) in [https://github.com/Unleash/unleash-client-node/pull/540](https://togithub.com/Unleash/unleash-client-node/pull/540) **Full Changelog**: Unleash/unleash-client-node@v5.2.0...v5.3.0 ### [`v5.2.0`](https://togithub.com/Unleash/unleash-client-node/releases/tag/v5.2.0) [Compare Source](https://togithub.com/Unleash/unleash-client-node/compare/v5.1.0...v5.2.0) #### What's Changed - chore(deps): update dependency [@​types/semver](https://togithub.com/types/semver) to v7.5.5 by [@​renovate](https://togithub.com/renovate) in [https://github.com/Unleash/unleash-client-node/pull/527](https://togithub.com/Unleash/unleash-client-node/pull/527) - feat: Backoff based on HTTP statuses by [@​chriswk](https://togithub.com/chriswk) in [https://github.com/Unleash/unleash-client-node/pull/537](https://togithub.com/Unleash/unleash-client-node/pull/537) **Full Changelog**: Unleash/unleash-client-node@v5.1.0...v5.2.0 ### [`v5.1.0`](https://togithub.com/Unleash/unleash-client-node/releases/tag/v5.1.0) [Compare Source](https://togithub.com/Unleash/unleash-client-node/compare/v5.0.0...v5.1.0) #### What's Changed - feat: variant feature status - client specification update by [@​Tymek](https://togithub.com/Tymek) in [https://github.com/Unleash/unleash-client-node/pull/536](https://togithub.com/Unleash/unleash-client-node/pull/536) **Full Changelog**: Unleash/unleash-client-node@v5.0.0...v5.1.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "after 7pm every weekday,before 5am every weekday" in timezone Europe/Madrid, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/Unleash/unleash). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy44MS4zIiwidXBkYXRlZEluVmVyIjoiMzcuODEuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
What
This PR adds incremental backoff to both metrics posting as well as feature fetching.
It does so by keeping track of the number of failures we've seen and adding
interval + (failureCount * interval)
to the nextsetTimeout
call. Each succeeding call of the same type will decrement failure count, reducing the interval respectively.Discussion points.
I've set maximum failure count to 10 to ensure we never wait forever before the next poll/post of metrics.