Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Backoff based on HTTP statuses #537

Merged
merged 12 commits into from
Nov 21, 2023
Merged

feat: Backoff based on HTTP statuses #537

merged 12 commits into from
Nov 21, 2023

Conversation

chriswk
Copy link
Member

@chriswk chriswk commented Nov 15, 2023

What

This PR adds incremental backoff to both metrics posting as well as feature fetching.
It does so by keeping track of the number of failures we've seen and adding interval + (failureCount * interval) to the next setTimeout call. Each succeeding call of the same type will decrement failure count, reducing the interval respectively.

Discussion points.

I've set maximum failure count to 10 to ensure we never wait forever before the next poll/post of metrics.

@chriswk chriswk self-assigned this Nov 15, 2023
@coveralls
Copy link

coveralls commented Nov 15, 2023

Coverage Status

coverage: 91.222% (-0.2%) from 91.376%
when pulling 75c83e9 on task/obeyHttpStatuses
into 35e0e29 on main.


if (process.env.NODE_ENV !== 'test' && typeof this.timer.unref === 'function') {
this.timer.unref();
}
}

start(): void {
if (typeof this.metricsInterval === 'number' && this.metricsInterval > 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typescript compiler stated that this was no longer necessary

src/math.ts Outdated Show resolved Hide resolved
@@ -89,6 +95,7 @@ export default class Repository extends EventEmitter implements EventEmitter {
instanceId,
projectName,
refreshInterval = 15_000,
maxRefreshInterval = 120_000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expose this in the constructor, it gives consumers the ability to just turn off our backoff. Do we want that? Tempted to use a constant here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's removed, and replaced by putting a maximum backoff of 10 * interval

@@ -812,18 +812,16 @@ acorn@^8.9.0:

agent-base@6, agent-base@^6.0.2:
version "6.0.2"
resolved "https://registry.npmjs.org/agent-base/-/agent-base-6.0.2.tgz"
resolved "https://registry.yarnpkg.com/agent-base/-/agent-base-6.0.2.tgz#49fff58577cfee3f37176feab4c22e00f86d7f77"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The yarn lock update needed or just boyscouting?

Neither is an issue, just curious for context

Copy link
Member Author

@chriswk chriswk Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happened automatically when I ran yarn install.

this.failures = max(this.failures - 1, 0);
return this.nextFetch();
}

async fetch(): Promise<void> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch is now just short of 100 lines. Might be worth breaking this down into smaller methods at this point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. Something like what we did for metrics. I'll have a look.

sighphyre
sighphyre previously approved these changes Nov 16, 2023
Copy link
Member

@sighphyre sighphyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it does what it says on the tin, tests seem to cover it nicely too

Suggested some small things but I'll leave the choices up to you

@chriswk chriswk requested a review from sighphyre November 16, 2023 13:20
@chriswk chriswk force-pushed the task/obeyHttpStatuses branch from 9f9332d to 56d5908 Compare November 16, 2023 13:23
nunogois
nunogois previously approved these changes Nov 20, 2023
Copy link
Member

@nunogois nunogois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@gastonfournier gastonfournier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, will continue reviewing

src/metrics.ts Outdated
}

getInterval(): number {
if(this.metricsInterval === 0 || this.failures === -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should failures be initialized to -1? I can see this working but it makes it hard to reason about when we're checking against -1 here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failures start at 0, if we initialize it to -1 we'd never actually try to post metrics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since you're asking, I guess it isn't as clear as I thought it'd be. extract to a method called stopPosting() instead?

src/metrics.ts Outdated
if (!res.ok) {
if (res.status === 404 || res.status === 403 || res.status == 401) {
this.unrecoverableError(url, res.status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really unrecoverable? If our API is down for a small period of time you may get a 404 response and it may recover later. Or you may get a 403 because your token does not have access to a resource, and then some admin might grant that token permissions. If we make this unrecoverable, they will need a restart.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember from the quick discovery/discussion session we had that we thought maybe not stop polling just treat it as a backoff-response, for the reasons @gastonfournier mentions here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 404:

  • if your api is down, you should be getting 500 (service error) or a type of 502 or 504 (gateway timeouts), not 404, at least from our hosted instances. I guess there are reverse proxies out there that returns 404 if the service they're proxying is not available.

For 401/403:
Currently, you can't give tokens access after creation, so I'd be very surprised if you went from 403 to 200. You either have a client token or not. A client token which gets new access would simply see more features returned, not suddenly move from a FORBIDDEN to an OK response.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doing what the Java SDK did instead, immediately log at error level that the API most likely is incorrectly configured and then back off to our maximum backoff (10 * interval), could also be an answer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense! I'm still a bit worried we stop posting and we will not notice what's wrong. If we backoff might mean we get a request every now and then but not hammering our servers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for metrics is ok to be more aggressive and stop posting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed the method to configurationError after discussion with Gaston

src/metrics.ts Outdated
@@ -170,6 +190,19 @@ export default class Metrics extends EventEmitter {
return true;
}

unrecoverableError(url: string, statusCode: number) {
this.emit(UnleashEvents.Warn, `${url} returning ${statusCode}, stopping metrics`);
this.failures = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.failures = -1;
this.metricsInterval = 0;

wouldn't this be enough? Just to avoid dealing with failures = -1 which is a bit confusing IMO

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the benefit in having a value indicate unrecoverable error, but I think it should in that case be a separate variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, we're calling an explicit this.stop() here already, so dropping the -1 seems fair. Will fix.

src/metrics.ts Outdated Show resolved Hide resolved
daveleek
daveleek previously approved these changes Nov 20, 2023
Copy link

@daveleek daveleek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I noticed the implementations are a little bit different between the metrics and the features pollings. Is that intentional?

@chriswk
Copy link
Member Author

chriswk commented Nov 20, 2023

Looks good! I noticed the implementations are a little bit different between the metrics and the features pollings. Is that intentional?

Yes, the metrics is a POST action, whereas the polling is a GET. We're also more OK with losing metrics than not being able to get latest feature configuration.

@chriswk chriswk dismissed stale reviews from daveleek and nunogois via 404d610 November 20, 2023 14:08
gastonfournier
gastonfournier previously approved these changes Nov 20, 2023
Copy link
Contributor

@gastonfournier gastonfournier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@nunogois nunogois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed some approvals got stale after recent changes, so here's my approval in case it unblocks you 👍

@chriswk chriswk merged commit 748d793 into main Nov 21, 2023
4 checks passed
@chriswk chriswk deleted the task/obeyHttpStatuses branch November 21, 2023 08:46
renovate bot referenced this pull request in Unleash/unleash Dec 5, 2023
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [unleash-client](https://togithub.com/Unleash/unleash-client-node) |
[`5.0.0` ->
`5.3.0`](https://renovatebot.com/diffs/npm/unleash-client/5.0.0/5.3.0) |
[![age](https://developer.mend.io/api/mc/badges/age/npm/unleash-client/5.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/unleash-client/5.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/unleash-client/5.0.0/5.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/unleash-client/5.0.0/5.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

### Release Notes

<details>
<summary>Unleash/unleash-client-node (unleash-client)</summary>

###
[`v5.3.0`](https://togithub.com/Unleash/unleash-client-node/releases/tag/v5.3.0)

[Compare
Source](https://togithub.com/Unleash/unleash-client-node/compare/v5.2.0...v5.3.0)

#### What's Changed

- fix: backward compatibility by
[@&#8203;gastonfournier](https://togithub.com/gastonfournier) in
[https://github.com/Unleash/unleash-client-node/pull/540](https://togithub.com/Unleash/unleash-client-node/pull/540)

**Full Changelog**:
Unleash/unleash-client-node@v5.2.0...v5.3.0

###
[`v5.2.0`](https://togithub.com/Unleash/unleash-client-node/releases/tag/v5.2.0)

[Compare
Source](https://togithub.com/Unleash/unleash-client-node/compare/v5.1.0...v5.2.0)

#### What's Changed

- chore(deps): update dependency
[@&#8203;types/semver](https://togithub.com/types/semver) to v7.5.5 by
[@&#8203;renovate](https://togithub.com/renovate) in
[https://github.com/Unleash/unleash-client-node/pull/527](https://togithub.com/Unleash/unleash-client-node/pull/527)
- feat: Backoff based on HTTP statuses by
[@&#8203;chriswk](https://togithub.com/chriswk) in
[https://github.com/Unleash/unleash-client-node/pull/537](https://togithub.com/Unleash/unleash-client-node/pull/537)

**Full Changelog**:
Unleash/unleash-client-node@v5.1.0...v5.2.0

###
[`v5.1.0`](https://togithub.com/Unleash/unleash-client-node/releases/tag/v5.1.0)

[Compare
Source](https://togithub.com/Unleash/unleash-client-node/compare/v5.0.0...v5.1.0)

#### What's Changed

- feat: variant feature status - client specification update by
[@&#8203;Tymek](https://togithub.com/Tymek) in
[https://github.com/Unleash/unleash-client-node/pull/536](https://togithub.com/Unleash/unleash-client-node/pull/536)

**Full Changelog**:
Unleash/unleash-client-node@v5.0.0...v5.1.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "after 7pm every weekday,before 5am
every weekday" in timezone Europe/Madrid, Automerge - At any time (no
schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/Unleash/unleash).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy44MS4zIiwidXBkYXRlZEluVmVyIjoiMzcuODEuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants