Retry cert fetching from the top #225

bengerman13 · 2021-12-02T18:44:18Z

In order to reduce manual interventions when DNS is flaky, we want to retry failed certificate operations from the start a fixed number of times, less than half the cert rate limit

Acceptance Criteria

GIVEN an update operation
AND apparently-valid DNS configuration
WHEN the retrieve certificate step fails
THEN we should check the number of retries
AND retry certificate provisioning with a new certificate order
AND increment the number of retries
GIVEN a provision operation
AND apparently-valid DNS configuration
WHEN the retrieve certificate step fails
THEN we should check the number of retries
AND retry certificate provisioning with a new certificate order
AND increment the number of retries

Security considerations

No changes

Implementation sketch

This probably requires one or both of:

manage the retry loop outside of huey
rethink how we break up the cert tasks

Maybe special logic in the failed task handler, something like:

did the task fail on one of the lets encrypt steps?
has it failed N or more times (track this in a new column, most likely, or maybe calculate based on number of challenges or orders?)
if so, kick off a copy of the previous provision/update pipeline? or maybe clean up the models from this pipeline and let the restarter pick it up?
set the failure count
It would be good to do this without doing a bunch of retries first, so we don't burn all the time CAPI/the migrator wait for upgrades/provisioning

Once done, we need to make sure that the new total length of time

markdboyd · 2024-06-24T14:15:28Z

@bengerman13 Is this ticket still relevant or important for us to address?

bengerman13 · 2024-06-24T15:47:28Z

this is probably still interesting, but I've been out of the loop on how much this drives customer issues lately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry cert fetching from the top #225

Retry cert fetching from the top #225

bengerman13 commented Dec 2, 2021

markdboyd commented Jun 24, 2024

bengerman13 commented Jun 24, 2024

Retry cert fetching from the top #225

Retry cert fetching from the top #225

Comments

bengerman13 commented Dec 2, 2021

Acceptance Criteria

Security considerations

Implementation sketch

markdboyd commented Jun 24, 2024

bengerman13 commented Jun 24, 2024