You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to reduce manual interventions when DNS is flaky, we want to retry failed certificate operations from the start a fixed number of times, less than half the cert rate limit
Acceptance Criteria
GIVEN an update operation
AND apparently-valid DNS configuration
WHEN the retrieve certificate step fails
THEN we should check the number of retries
AND retry certificate provisioning with a new certificate order
AND increment the number of retries
GIVEN a provision operation
AND apparently-valid DNS configuration
WHEN the retrieve certificate step fails
THEN we should check the number of retries
AND retry certificate provisioning with a new certificate order
AND increment the number of retries
Security considerations
No changes
Implementation sketch
This probably requires one or both of:
manage the retry loop outside of huey
rethink how we break up the cert tasks
Maybe special logic in the failed task handler, something like:
did the task fail on one of the lets encrypt steps?
has it failed N or more times (track this in a new column, most likely, or maybe calculate based on number of challenges or orders?)
if so, kick off a copy of the previous provision/update pipeline? or maybe clean up the models from this pipeline and let the restarter pick it up?
set the failure count
It would be good to do this without doing a bunch of retries first, so we don't burn all the time CAPI/the migrator wait for upgrades/provisioning
Once done, we need to make sure that the new total length of time
The text was updated successfully, but these errors were encountered:
In order to reduce manual interventions when DNS is flaky, we want to retry failed certificate operations from the start a fixed number of times, less than half the cert rate limit
Acceptance Criteria
AND apparently-valid DNS configuration
WHEN the retrieve certificate step fails
THEN we should check the number of retries
AND retry certificate provisioning with a new certificate order
AND increment the number of retries
AND apparently-valid DNS configuration
WHEN the retrieve certificate step fails
THEN we should check the number of retries
AND retry certificate provisioning with a new certificate order
AND increment the number of retries
Security considerations
No changes
Implementation sketch
This probably requires one or both of:
Maybe special logic in the failed task handler, something like:
It would be good to do this without doing a bunch of retries first, so we don't burn all the time CAPI/the migrator wait for upgrades/provisioning
Once done, we need to make sure that the new total length of time
The text was updated successfully, but these errors were encountered: