Add ability to mask absolute version numbers in Nextflow tests #40
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The Nextflow configuration tests have the pipeline's version repeated multiple times in the expected values. That means that any version bump requires updating every test, like with uclahs-cds/pipeline-call-gSV#151 (comment).
In order to eliminate those kinds of frustrating PRs, this PR adds an optional new
version_fields
parameter for tests. That parameter is (as might be expected) a list of fields that contain the version. Any field listed should also have its embedded version number(s) updated to the stringVER.SI.ON
, like so:When a test is run, the true/current version number is parsed from the
manifest.version
field of the raw test output. Each field inversion_fields
then has that exact version number replaced withVER.SI.ON
before the comparison with the expected results.The effect of that is that the string
VER.SI.ON
always represents one specific version throughout the entire file, even if that specific version is variable. That means that the test in the pipeline-call-gSV example above would not have needed to be modified.It also means that we'll catch if a version number is incorrectly hard-coded somewhere, e.g.
manifest.version = "1.0.0"; params.randomvar = "pipeline_${manifest.version}/output_1.0.0/"
. If the test JSON were written to assume that both numbers would update, i.e."randomvar": "pipeline_VER.SI.ON/output_VER.SI.ON/"
, then that would fail oncemanifest.version
was updated to anything other than1.0.0
.I've tested this locally on pipeline-recalibrate-BAM and pipeline-call-gSV.
Checklist
This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded.
Disclosing PHI is a major problem1 - Even a small leak can be costly2.
This PR does NOT contain germline genetic data3, RNA-Seq, DNA methylation, microbiome or other molecular data4.
.png
, .jpeg
),.pdf
,.RData
,.xlsx
,.doc
,.ppt
, or other output files.To automatically exclude such files using a .gitignore file, see here for example.
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have set up or verified the
main
branch protection rule following the github standards before opening this pull request.The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have added the major changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.Footnotes
UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records ↩
The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records. ↩
Genetic information is considered PHI.
Forensic assays can identify patients with as few as 21 SNPs ↩
RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity. ↩