-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add capability for more complicated mocks #26
Conversation
I'm completely open to feedback here. If I hadn't just spent a week littering our organization with regression tests I probably would have added a I'm also trying to walk the line between enforcing a restricted coding standard on the pipeline configurations and supporting reasonable use cases. In this case specifically the config seemed reasonable as-written, so I wanted to support it with added complexity in this tool. |
Here's a quick rundown of possible solutions and why I don't like them:
The least worst approach seems to be this dynamic mocking, but I welcome any discussion about that. |
Upon further consideration, I can't think of any other solutions that wouldn't more complicated than the ones above. I'm ok with adopting this dynamic mocking if no concerns/additional suggestions from others |
agreed, all other solutions come with their own headaches. I like that this approach won't affect the current tests. |
LGTM, assuming this would work with any number and combination of input tumor and normal BAM files. |
Description
This PR adds the capability for more complicated mocks that can return different values depending upon the arguments.
This is specifically to solve handle configs like this:
The pipeline accepts paired BAMs, and it performs the reasonable check that those BAMs do not have matching sample IDs. However, our only mockable hook in there is
parse_bam_header
, and returning the same value twice will cause the configuration to fail.So. I'm adding a layer of complexity with "dynamic mocks" that look like this:
The function name has
DYNAMIC|
prepended to it, and there is another level that maps from the function's arguments to the return value. If the mocked function is called with any other arguments it will report them and fail:The function arguments are also presented as a JSONified string of an
Object []
. That means that it is always an array, even for zero- or one-argument functions.Checklist
This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded.
Disclosing PHI is a major problem1 - Even a small leak can be costly2.
This PR does NOT contain germline genetic data3, RNA-Seq, DNA methylation, microbiome or other molecular data4.
.png
, .jpeg
),.pdf
,.RData
,.xlsx
,.doc
,.ppt
, or other output files.To automatically exclude such files using a .gitignore file, see here for example.
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have set up or verified the
main
branch protection rule following the github standards before opening this pull request.The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have added the major changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.Footnotes
UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records ↩
The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records. ↩
Genetic information is considered PHI.
Forensic assays can identify patients with as few as 21 SNPs ↩
RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity. ↩