Overhaul GH Action workflows #440

joverlee521 · 2024-02-28T21:40:25Z

Description of proposed changes

Overhaul the GH Action workflows with the main goal to use the shared pathogen-repo-build workflow.
This allows the GH Action workflow success/failure status to match the AWS Batch job status and will allow us to monitor the success/failure of automated workflows via the Pathogen workflow status page.

I made other improvements to use workflow inputs in order to reduce the 8 different GH Action workflows down to 2 workflows. See commits for details.

Related issue(s)

Related to nextstrain/.github#46

Checklist

Checks pass
Test GISAID build
Test GenBank build
Post-merge: Remove the AWS_DEFAULT_REGION repo secret

joverlee521 · 2024-02-29T20:53:07Z

The test GISAID build completed successfully:

using the trial name and uploaded files to to s3://nextstrain-ncov-private/trial/overhaul-gh-action-workflows/
using the custom image nextstrain/ncov-ingest:branch-fix-image
using the TEST_SLACK_CHANNEL variable and sent messages to #scratch

It still fetched from the upstream database when the fetch_from_database input was false, but I believe this should be fixed with 4d5b611.

The test GenBank build failed during upload due to permission issues because I forgot the NextstrainNcovIngestUploader policies allow the "files/ncov/open/" prefix (not "files/workflows/ncov/open/").

I'll update the trial prefix to "files/ncov/open/trial/<TRIAL_NAME>/" in order to match the production prefix. I don't think we will ever want to update the ncov production URLs since they are advertised in our SARS-CoV-2 docs.

joverlee521 · 2024-02-29T20:59:21Z

Updated GenBank trial prefix in force-push

Running a new GenBank trial build.

joverlee521 · 2024-03-01T18:23:42Z

GenBank trial build completed successfully and uploaded outputs to s3://nextstrain-data/files/ncov/open/trial/overhaul-gh-action-workflows/.

I'm doing one final trial run today to test the fetch_from_database input which should be fixed by 4c15c84. If all goes well, I'll merge this on Monday so that I can monitor the automated runs.

joverlee521 · 2024-03-01T18:49:23Z

Okay, actual final trial run with final fixes to fetch_from_database in b926bd1 🤞

.github/workflows/fetch-and-ingest-genbank-master.yml

jameshadfield · 2024-03-04T01:26:15Z

.github/workflows/fetch-and-ingest-genbank-master.yml

I think it'd be possible to combine the gisaid + genbank workflows into one (by adding a bit more conditional logic) - the workflow files are almost identical. Did you choose not to do this to allow the actions page to easily distinguish between the two?

That brings up a related point -- if we are using the actions page to monitor progress for the foreseeable future, is the best view into this to filter by event:scheduled to ignore trial / branch runs?

Did you choose not to do this to allow the actions page to easily distinguish between the two?

Yeah, I think it's nice to be able to easily distinguish between GISAID and GenBank workflows. (The pathogen status page currently groups by repo/workflow name). Also, the scheduled GitHub Action workflows do not use inputs, so I think the conditionals to trigger scheduled workflows with different default values could get pretty messy.

if we are using the actions page to monitor progress for the foreseeable future, is the best view into this to filter by event:scheduled to ignore trial / branch runs?

Yup, that's should be the easiest way to monitor the automated runs for now. I think it'd also be reasonable to add a new status page that only filters to the scheduled events.

Update the fetch-and-ingest workflows to use the `trial_name` input to determine whether the workflow should trigger downstream workflows and the S3 destination prefix for uploaded files. This changes the S3 destinations for both GISAID and GenBank test builds. GISAID Old: s3://nextstrain-ncov-private/branch/${GITHUB_BRANCH}/ New: s3://nextstrain-ncov-private/trial/${TRIAL_NAME}/ GenBank: Old: s3://nextstrain-staging/files/ncov/open/branch/${GITHUB_BRANCH} New: s3://nextstrain-data/files/ncov/open/trial/${TRIAL_NAME} This follows the prefix patterns used by other more recent GitHub Action workflows. By using the trial input, we also no longer have to maintain extra copies of the "*-branch" workflows. Note that I'm not updating the ingest-* workflows because they will be made obsolete in the following commit.

Update the fetch-and-ingest workflows to use the `fetch_from_database` input to determine whether the workflow should fetch from upstream databases. This means we no longer have to maintain extra copies of the "ingest-*" workflows. One downside is this input is _not_ included with the repository_dispatch event. However, it seems like everyone's using the workflow_dispatch via the GitHub UI instead of the repository_dispatch anyways: ``` $ gh api /repos/nextstrain/ncov-ingest/actions/runs?event=workflow_dispatch --jq .total_count --paginate 60 $ gh api /repos/nextstrain/ncov-ingest/actions/runs?event=repository_dispatch --jq .total_count --paginate 0 ```

Removes the need to use the `bin/write-envdir` script. Also removes the need to use the roundabout `--exec env` invocations, which means we no longer need to pass the `--cores`, `--resources`, and `--printshellcmds` flags to Snakemake. These are automatically passed through the default `snakemake` exec. https://github.com/nextstrain/cli/blob/4fa53169059047454fb10b55a584395239b61a1a/nextstrain/cli/command/build.py#L147 https://github.com/nextstrain/cli/blob/4fa53169059047454fb10b55a584395239b61a1a/nextstrain/cli/command/build.py#L190-L236

Allows the GH Action workflows to stay connected to the AWS Batch jobs when using the latest pathogen-repo-build workflow. This way, the status of the GH Action workflow accurately reflects the status of the ingest workflow. We can then also use the Nextstrain pathogen workflow status page¹ to monitor the success/failure of the automated workflows. ¹ https://nextstrain.github.io/status/pathogen-workflows.html

There are many instances in that past where we've made temp commits (e.g. ddbf15d and aab1f5f) to edit the Docker image used for a test run of the workflow. This input follows the pattern of more recent GH Action workflows to use an input to modify the Docker image.

Prevent the trial runs from being mixed with the automated runs by sending Slack notifications to the `TEST_SLACK_CHANNEL` (#scratch).

If the GH Action event trigger is not `workflow_dispatch`, then set the default value for `FETCH_FROM_DATABASE` to `true`. Otherwise, use the provided input value. Previously the `FETCH_FROM_DATABASE` was always set to `true` because the ternary statement would return the value after `||` when the input value was `false`. Note, the expression ``` FETCH_FROM_DATABASE: ${{ inputs.fetch_from_database == '' && true || inputs.fetch_from_database }} ``` also does _not_ work because GH Action expressions performs loose equality comparisons that coerces the type to a number if types do not match.¹ This coerced `false` to `0` and the empty string to `0` leading the ternary operator to always return the default `true` value. ¹ https://docs.github.com/en/actions/learn-github-actions/expressions#operators

joverlee521 · 2024-03-04T18:32:20Z

Ah, missed today's automated runs. I'll merge EOD today in case there are other comments.

corneliusroemer · 2024-03-05T13:28:44Z

Thanks for cleaning up the workflows @joverlee521!

I just tried this out and it took a bit of time to understand how to configure the workflow input params to get it to do what I want (run on branch, do a test run, no fetch).

Maybe there's scope for a single "test" workflow that simply does repository dispatch with these params? The new single workflow is great for maintenance but might cause errors when misused due to complexity in selecting the right inputs. A neutered test dispatch might reduce accidental misuse likelihood.

corneliusroemer · 2024-03-05T13:30:31Z

Oh wow, I just discovered that the logs are streamed from AWS batch - that's amazing! Great job @joverlee521!

joverlee521 · 2024-03-05T18:36:17Z

Oh wow, I just discovered that the logs are streamed from AWS batch - that's amazing!

This is all thanks to @tsibley's update to the pathogen-repo-build workflow to let it stay attached to the AWS Batch jobs 🙌

joverlee521 · 2024-03-05T18:39:19Z

I just tried this out and it took a bit of time to understand how to configure the workflow input params to get it to do what I want (run on branch, do a test run, no fetch).

Maybe there's scope for a single "test" workflow that simply does repository dispatch with these params? The new single workflow is great for maintenance but might cause errors when misused due to complexity in selecting the right inputs. A neutered test dispatch might reduce accidental misuse likelihood.

Are there additional docs that I can add to clarify the use of the workflow input params?

If we wanted to support repository dispatch, we would have to complicate the workflows a bit to check for github.event.client_payload.<input_name> params in addition to the input.<input_name> checks. I'm not sure that's worth adding right now. I'm not too worried about "misuse" since cancelling the GitHub workflow run will also cancel the attached AWS Batch job now.

tsibley · 2024-04-18T22:43:02Z

.github/workflows/fetch-and-ingest-genbank-master.yml

@@ -99,6 +113,7 @@ jobs:
        AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        PAT_GITHUB_DISPATCH: ${{ secrets.GH_TOKEN_NEXTSTRAIN_BOT_WORKFLOW_DISPATCH }}
        TRIAL_NAME: ${{ inputs.trial_name }}
+        FETCH_FROM_DATABASE: ${{ inputs.fetch_from_database != '' && inputs.fetch_from_database || true }}


This, combined with the type: boolean of inputs.fetch_from_database, is going to result in FETCH_FROM_DATABASE being either the strings true or false, right?

And then the Bash condition (earlier up) is [[ "$FETCH_FROM_DATABASE" ]], which evaluates to true (exits 0) for both [[ "true" ]] and [[ "false" ]]. Yeah?

So does setting inputs.fetch_from_database to false actually work? (or am I missing something?)

Oh, I see it gets changed/fixed in a later commit, but also #440 (comment).

tsibley · 2024-04-18T22:44:34Z

bin/write-envdir

Love to see this. :-)

tsibley · 2024-04-18T22:49:44Z

.github/workflows/fetch-and-ingest-genbank-master.yml

-    - name: notify_pipeline_failed
-      if: ${{ failure() }}
-      run: ./vendored/notify-on-job-fail Ingest nextstrain/ncov-ingest
+            $CONFIG_OVERRIDES


FWIW, this opens up a command-line injection via inputs.trial_name (though in practice it's probably fine given our usage).

OH, I've repeated this pattern across other GH Action workflows 🙈

What would be the safer thing to do here?

The best thing is building up an array of strings, which can then be expanded with quoting to prevent subsequent word splitting. But with the config overrides being computed in a separate job, you can't pass an array, only a single long string. One way forward with that is to use shell-quoting semantics inside that single long string (i.e. nested quoting), but that gets tedious and is easy to mess up. Stepping back, why did the config override get moved into a separate job? Does it need to be?

Or, as an alternative to passing around a command-line snippet in a string/env var, we could produce another YAML file and pass that instead?

Stepping back, why did the config override get moved into a separate job? Does it need to be?

It had to get moved to a separate job because the config override is not currently part of the pathogen-repo-build workflow. Since configs are pretty different across pathogen workflows, I didn't want to shoehorn it into the reusable workflow.

Or, as an alternative to passing around a command-line snippet in a string/env var, we could produce another YAML file and pass that instead?

Ah, so build a custom config YAML that is uploaded/download as an artifact of the GH Action workflow. I think this is doable with updating pathogen-repo-build workflow to take a artifact name as an input and download the artifact as part of the reusable workflow. Yeah, this seems like the way to go if we are unable fully standardize configs across pathogens.

It had to get moved to a separate job because the config override is not currently part of the pathogen-repo-build workflow. Since configs are pretty different across pathogen workflows, I didn't want to shoehorn it into the reusable workflow.

Can't it go in the run input of the pathogen-repo-build workflow call?

Hah, yes 🤦‍♀️

tsibley · 2024-04-18T22:53:14Z

.github/workflows/fetch-and-ingest-genbank-master.yml

        run: |
          config="--config"

-          if [[ "$FETCH_FROM_DATABASE" ]]; then
+          if "$FETCH_FROM_DATABASE"; then


This relies on running the programs true and false... 🫣

Thanks for flagging! Updating in #444

Prompted by post-merge review #440 (comment)

Prompted by post-merge review #440 (comment) Instead of running `true` and `false` programs, explicitly test that `FETCH_FROM_DATABASE` is true.

joverlee521 requested a review from a team February 28, 2024 21:40

joverlee521 mentioned this pull request Feb 28, 2024

Centralize existing pathogen build workflows nextstrain/.github#46

Closed

6 tasks

joverlee521 force-pushed the overhaul-gh-action-workflows branch from 4d5b611 to 9ccf797 Compare February 29, 2024 20:57

joverlee521 force-pushed the overhaul-gh-action-workflows branch from 4c15c84 to b926bd1 Compare March 1, 2024 18:46

jameshadfield reviewed Mar 4, 2024

View reviewed changes

joverlee521 added 7 commits March 4, 2024 10:21

fetch-and-ingest: Use TEST_SLACK_CHANNEL for trials

bae20ea

Prevent the trial runs from being mixed with the automated runs by sending Slack notifications to the `TEST_SLACK_CHANNEL` (#scratch).

joverlee521 force-pushed the overhaul-gh-action-workflows branch from b926bd1 to 3d92f48 Compare March 4, 2024 18:21

joverlee521 merged commit d06e653 into master Mar 5, 2024
1 check passed

joverlee521 deleted the overhaul-gh-action-workflows branch March 5, 2024 01:26

tsibley reviewed Apr 18, 2024

View reviewed changes

joverlee521 added a commit that referenced this pull request Apr 19, 2024

fetch-and-ingest: stop running true/false programs

512e2ea

Prompted by post-merge review #440 (comment)

joverlee521 added a commit that referenced this pull request Apr 19, 2024

fetch-and-ingest: Explicitly test FETCH_FROM_DATABASE

6abb9a9

Prompted by post-merge review #440 (comment) Instead of running `true` and `false` programs, explicitly test that `FETCH_FROM_DATABASE` is true.

joverlee521 added a commit that referenced this pull request Apr 19, 2024

fetch-and-ingest: Explicitly test FETCH_FROM_DATABASE

3becb52

Prompted by post-merge review #440 (comment) Instead of running `true` and `false` programs, explicitly test that `FETCH_FROM_DATABASE` is true.

This was referenced Apr 19, 2024

fetch-and-ingest: Explicitly test FETCH_FROM_DATABASE #444

Merged

How should we standardize GitHub Action workflows per repo? nextstrain/pathogen-repo-guide#25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul GH Action workflows #440

Overhaul GH Action workflows #440

joverlee521 commented Feb 28, 2024 •

edited

Loading

joverlee521 commented Feb 29, 2024

joverlee521 commented Feb 29, 2024

joverlee521 commented Mar 1, 2024

joverlee521 commented Mar 1, 2024

jameshadfield Mar 4, 2024

joverlee521 Mar 4, 2024

joverlee521 commented Mar 4, 2024

corneliusroemer commented Mar 5, 2024

corneliusroemer commented Mar 5, 2024

joverlee521 commented Mar 5, 2024

joverlee521 commented Mar 5, 2024

tsibley Apr 18, 2024

tsibley Apr 18, 2024 •

edited

Loading

tsibley Apr 18, 2024

tsibley Apr 18, 2024

joverlee521 Apr 18, 2024

tsibley Apr 22, 2024

joverlee521 Apr 23, 2024

tsibley Apr 23, 2024

joverlee521 Apr 23, 2024

tsibley Apr 18, 2024

joverlee521 Apr 19, 2024

Overhaul GH Action workflows #440

Overhaul GH Action workflows #440

Conversation

joverlee521 commented Feb 28, 2024 • edited Loading

Description of proposed changes

Related issue(s)

Checklist

joverlee521 commented Feb 29, 2024

joverlee521 commented Feb 29, 2024

joverlee521 commented Mar 1, 2024

joverlee521 commented Mar 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joverlee521 commented Mar 4, 2024

corneliusroemer commented Mar 5, 2024

corneliusroemer commented Mar 5, 2024

joverlee521 commented Mar 5, 2024

joverlee521 commented Mar 5, 2024

Choose a reason for hiding this comment

tsibley Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joverlee521 commented Feb 28, 2024 •

edited

Loading

tsibley Apr 18, 2024 •

edited

Loading