-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul GH Action workflows #440
Conversation
The test GISAID build completed successfully:
It still fetched from the upstream database when the The test GenBank build failed during upload due to permission issues because I forgot the I'll update the trial prefix to "files/ncov/open/trial/<TRIAL_NAME>/" in order to match the production prefix. I don't think we will ever want to update the ncov production URLs since they are advertised in our SARS-CoV-2 docs. |
4d5b611
to
9ccf797
Compare
Updated GenBank trial prefix in force-push Running a new GenBank trial build. |
GenBank trial build completed successfully and uploaded outputs to I'm doing one final trial run today to test the |
4c15c84
to
b926bd1
Compare
Okay, actual final trial run with final fixes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be possible to combine the gisaid + genbank workflows into one (by adding a bit more conditional logic) - the workflow files are almost identical. Did you choose not to do this to allow the actions page to easily distinguish between the two?
That brings up a related point -- if we are using the actions page to monitor progress for the foreseeable future, is the best view into this to filter by event:scheduled to ignore trial / branch runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you choose not to do this to allow the actions page to easily distinguish between the two?
Yeah, I think it's nice to be able to easily distinguish between GISAID and GenBank workflows. (The pathogen status page currently groups by repo/workflow name). Also, the scheduled GitHub Action workflows do not use inputs, so I think the conditionals to trigger scheduled workflows with different default values could get pretty messy.
if we are using the actions page to monitor progress for the foreseeable future, is the best view into this to filter by event:scheduled to ignore trial / branch runs?
Yup, that's should be the easiest way to monitor the automated runs for now. I think it'd also be reasonable to add a new status page that only filters to the scheduled events.
Update the fetch-and-ingest workflows to use the `trial_name` input to determine whether the workflow should trigger downstream workflows and the S3 destination prefix for uploaded files. This changes the S3 destinations for both GISAID and GenBank test builds. GISAID Old: s3://nextstrain-ncov-private/branch/${GITHUB_BRANCH}/ New: s3://nextstrain-ncov-private/trial/${TRIAL_NAME}/ GenBank: Old: s3://nextstrain-staging/files/ncov/open/branch/${GITHUB_BRANCH} New: s3://nextstrain-data/files/ncov/open/trial/${TRIAL_NAME} This follows the prefix patterns used by other more recent GitHub Action workflows. By using the trial input, we also no longer have to maintain extra copies of the "*-branch" workflows. Note that I'm not updating the ingest-* workflows because they will be made obsolete in the following commit.
Update the fetch-and-ingest workflows to use the `fetch_from_database` input to determine whether the workflow should fetch from upstream databases. This means we no longer have to maintain extra copies of the "ingest-*" workflows. One downside is this input is _not_ included with the repository_dispatch event. However, it seems like everyone's using the workflow_dispatch via the GitHub UI instead of the repository_dispatch anyways: ``` $ gh api /repos/nextstrain/ncov-ingest/actions/runs?event=workflow_dispatch --jq .total_count --paginate 60 $ gh api /repos/nextstrain/ncov-ingest/actions/runs?event=repository_dispatch --jq .total_count --paginate 0 ```
Removes the need to use the `bin/write-envdir` script. Also removes the need to use the roundabout `--exec env` invocations, which means we no longer need to pass the `--cores`, `--resources`, and `--printshellcmds` flags to Snakemake. These are automatically passed through the default `snakemake` exec. https://github.com/nextstrain/cli/blob/4fa53169059047454fb10b55a584395239b61a1a/nextstrain/cli/command/build.py#L147 https://github.com/nextstrain/cli/blob/4fa53169059047454fb10b55a584395239b61a1a/nextstrain/cli/command/build.py#L190-L236
Allows the GH Action workflows to stay connected to the AWS Batch jobs when using the latest pathogen-repo-build workflow. This way, the status of the GH Action workflow accurately reflects the status of the ingest workflow. We can then also use the Nextstrain pathogen workflow status page¹ to monitor the success/failure of the automated workflows. ¹ https://nextstrain.github.io/status/pathogen-workflows.html
Prevent the trial runs from being mixed with the automated runs by sending Slack notifications to the `TEST_SLACK_CHANNEL` (#scratch).
If the GH Action event trigger is not `workflow_dispatch`, then set the default value for `FETCH_FROM_DATABASE` to `true`. Otherwise, use the provided input value. Previously the `FETCH_FROM_DATABASE` was always set to `true` because the ternary statement would return the value after `||` when the input value was `false`. Note, the expression ``` FETCH_FROM_DATABASE: ${{ inputs.fetch_from_database == '' && true || inputs.fetch_from_database }} ``` also does _not_ work because GH Action expressions performs loose equality comparisons that coerces the type to a number if types do not match.¹ This coerced `false` to `0` and the empty string to `0` leading the ternary operator to always return the default `true` value. ¹ https://docs.github.com/en/actions/learn-github-actions/expressions#operators
b926bd1
to
3d92f48
Compare
Ah, missed today's automated runs. I'll merge EOD today in case there are other comments. |
Thanks for cleaning up the workflows @joverlee521! I just tried this out and it took a bit of time to understand how to configure the workflow input params to get it to do what I want (run on branch, do a test run, no fetch). Maybe there's scope for a single "test" workflow that simply does repository dispatch with these params? The new single workflow is great for maintenance but might cause errors when misused due to complexity in selecting the right inputs. A neutered test dispatch might reduce accidental misuse likelihood. |
Oh wow, I just discovered that the logs are streamed from AWS batch - that's amazing! Great job @joverlee521! |
This is all thanks to @tsibley's update to the |
Are there additional docs that I can add to clarify the use of the workflow input params? If we wanted to support repository dispatch, we would have to complicate the workflows a bit to check for |
@@ -99,6 +113,7 @@ jobs: | |||
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | |||
PAT_GITHUB_DISPATCH: ${{ secrets.GH_TOKEN_NEXTSTRAIN_BOT_WORKFLOW_DISPATCH }} | |||
TRIAL_NAME: ${{ inputs.trial_name }} | |||
FETCH_FROM_DATABASE: ${{ inputs.fetch_from_database != '' && inputs.fetch_from_database || true }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, combined with the type: boolean
of inputs.fetch_from_database
, is going to result in FETCH_FROM_DATABASE
being either the strings true
or false
, right?
And then the Bash condition (earlier up) is [[ "$FETCH_FROM_DATABASE" ]]
, which evaluates to true (exits 0) for both [[ "true" ]]
and [[ "false" ]]
. Yeah?
So does setting inputs.fetch_from_database
to false
actually work? (or am I missing something?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see it gets changed/fixed in a later commit, but also #440 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love to see this. :-)
- name: notify_pipeline_failed | ||
if: ${{ failure() }} | ||
run: ./vendored/notify-on-job-fail Ingest nextstrain/ncov-ingest | ||
$CONFIG_OVERRIDES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this opens up a command-line injection via inputs.trial_name
(though in practice it's probably fine given our usage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH, I've repeated this pattern across other GH Action workflows 🙈
What would be the safer thing to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best thing is building up an array of strings, which can then be expanded with quoting to prevent subsequent word splitting. But with the config overrides being computed in a separate job, you can't pass an array, only a single long string. One way forward with that is to use shell-quoting semantics inside that single long string (i.e. nested quoting), but that gets tedious and is easy to mess up. Stepping back, why did the config override get moved into a separate job? Does it need to be?
Or, as an alternative to passing around a command-line snippet in a string/env var, we could produce another YAML file and pass that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stepping back, why did the config override get moved into a separate job? Does it need to be?
It had to get moved to a separate job because the config override is not currently part of the pathogen-repo-build
workflow. Since configs are pretty different across pathogen workflows, I didn't want to shoehorn it into the reusable workflow.
Or, as an alternative to passing around a command-line snippet in a string/env var, we could produce another YAML file and pass that instead?
Ah, so build a custom config YAML that is uploaded/download as an artifact of the GH Action workflow. I think this is doable with updating pathogen-repo-build
workflow to take a artifact name as an input and download the artifact as part of the reusable workflow. Yeah, this seems like the way to go if we are unable fully standardize configs across pathogens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It had to get moved to a separate job because the config override is not currently part of the
pathogen-repo-build
workflow. Since configs are pretty different across pathogen workflows, I didn't want to shoehorn it into the reusable workflow.
Can't it go in the run
input of the pathogen-repo-build
workflow call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, yes 🤦♀️
run: | | ||
config="--config" | ||
|
||
if [[ "$FETCH_FROM_DATABASE" ]]; then | ||
if "$FETCH_FROM_DATABASE"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relies on running the programs true
and false
... 🫣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for flagging! Updating in #444
Prompted by post-merge review #440 (comment)
Prompted by post-merge review #440 (comment) Instead of running `true` and `false` programs, explicitly test that `FETCH_FROM_DATABASE` is true.
Prompted by post-merge review #440 (comment) Instead of running `true` and `false` programs, explicitly test that `FETCH_FROM_DATABASE` is true.
Description of proposed changes
Overhaul the GH Action workflows with the main goal to use the shared
pathogen-repo-build
workflow.This allows the GH Action workflow success/failure status to match the AWS Batch job status and will allow us to monitor the success/failure of automated workflows via the Pathogen workflow status page.
I made other improvements to use workflow inputs in order to reduce the 8 different GH Action workflows down to 2 workflows. See commits for details.
Related issue(s)
Related to nextstrain/.github#46
Checklist
AWS_DEFAULT_REGION
repo secret