- Start Date: 2022-09-20
- RFC Type: informational
- RFC PR: #12
eng-pipes (our internal service for handling webhooks) attempts to auto-retry GitHub actions builds for getsentry (internal sentry) for:
- any job which fails the
ensure docker image
step - any failed required job on the primary branch
the latter was recently disabled when it was discovered it was broken and was also blocking internal messaging.
the proposal is to remove this functionality entirely.
- dev-infra believes it is more important to improve job reliability rather than investing in a big-hammer retry which is more likely to lead to ignoring the actual problems
- it would require significant investment to make it work properly
- removing this feature removes complexity in
eng-pipes
we've invested a lot recently into reducing flakiness of setup tasks:
- using ghcr.io instead of dockerhub
- using prebuilt wheels from internal pypi
- caching volta / npm / yarn
- pinning requirements
- pinning github actions
- fix caching infinite hangs
we also already have 5x retries for python tests which we also believe is too high but is generally a better retry mechanism than rerunning the whole job. in the future we'd like to reduce this as it enables flaky tests as much as it improves CI experience however that is out of scope for this rfc.
I cannot find any successful transactions of this feature in the ENG-PIPES sentry project -- there are however (resolved) failures.
the other option is to invest into fixing and supporting this functionality.
the main drawback is if this functionality actually worked it would potentially improve CI experience
- dev-infra agrees with this plan but wants to get input before moving forward