-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Release Shepherd Rotation
The Release Shepherd is a 50% rotation that handles making the Terraform Provider Google (TPG) releases (https://github.com/hashicorp/terraform-provider-google/wiki/Release-Process) as well as maintaining test environments to ensure that merging contributions and making releases are as low-friction as possible.
The current schedule can be viewed in PagerDuty.
- Check the release history for any patch releases and confirm they have been cherrypicked or handled
- Rarely, changes will already be present in the release branch (i.e. due to early patch releases) or won't make sense to migrate forward. In these cases, confirm explicitly with last week's oncall that they've been handled and that the new release will not regress.
- Run the release based on last week's shepherd's release cut: https://github.com/hashicorp/terraform-provider-google/wiki/Release-Process#on-monday
- Check recently filed bugs for issues coming from the new release and flag them with the oncall. Evaluate whether a patch release is needed as discussed in the incident response policy.
- If the oncall has not picked up the issue, you should consider attempting to resolve it.
- Cut the release for next week: https://github.com/hashicorp/terraform-provider-google/wiki/Release-Process#on-wednesday
- Check nightly test runs for new issues that would block the next release or currently cut release and resolve them
- Nightly tests* can be found here: GA, Beta
- NOTE: In preparation for 5.0.0 we test the feature branches for the major release some days of the week (GA on Thurs, Beta on Fri). This means that the nightly tests run against main branch will not run on those days.
- For failures that will block cutting the next release, resolve them on main by fixing forward or rolling back changes as appropriate
- For failures that will block the currently cut release from going out, evaluate cherrypicking them
- Nightly tests* can be found here: GA, Beta
- Resolve (for services without
service/
labels) or label (for services withservice/
labels) failures in the nightly test results- If you're unable to resolve an issue, file a
test-failure
issue.
- If you're unable to resolve an issue, file a
- Check 2-3 recent PRs for unrelated or recurrent VCR failures and resolve them, filing a
test-failure
issue if you are unable to. - If other responsibilities have been addressed, find old
bug
orpersistent-bug
issues and resolve them.
NOTE: The history for a given test will be available in the old TC projects (GA, Beta) until October. At that point new test history data will have accumulated in the new TeamCity projects and the old projects' data will be out-dated.
As a 50% rotation, this should take around half of your time-at-desk. If you'll be unable to spend at least 8 hours on the responsibilities listed above, consider trading shifts with someone who will be able to. Conversely, if you're required to spend 16+ hours working on these responsibilities (outside of exceptional events like weeks where multiple patch releases are required), flag that with the team so that we can bring the time commitment back within expectations.
Patch releases for current GCP outages are handled by the Google Oncall as defined in the incident response policy. However, if they determine that additional help is required, they may enlist the release shepherd to drive the patch.
On the other hand, cherrypicks are generally handled by the release shepherd to ensure that they're the owner of the weekly minor release branch
- If the oncall determines that a change doesn't need a patch but we will want to cherrypick it (for example, if there's a major outage on a Friday), the release shepherd will own cherrypicking it.
- If the oncall makes a patch release, they'll work with the release shepherd to ensure that it's included in the next major release
Regressions and incidents should not generally block the next minor release. If rolling out a new release without resolving a new regression introduced in the last release will break additional users we should cancel the release, and regressions that break the entire provider (such as provider initialization issues that impact many users) likely prompt a freeze until resolved ASAP through a patch. In case of ambiguity, discuss with the oncall and mutually agree on a resolution plan, then communicate that in chat. If the oncall is unavailable, substitute a TL.
Note: In general, patches cut past midday Thursday are rare, and should be cherrypicked into the minor release rather than patched.