-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor summary edits #11
base: master
Are you sure you want to change the base?
Conversation
juhaa
commented
Dec 1, 2021
- Added sorting & tabixing of coding variants
- Moved the whole summary part to a subworkflow
- Update the config json with R8 file paths
header.extend(["fin.AF","nfsee.AF","rsid","gene_most_severe","most_severe"]) | ||
summary_outfile.write("\t".join(header)+"\n") | ||
#add pheno, fin enrichment | ||
header.extend(["fin.enrichment","phenotype"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last 2 columns are merged iin the provided results, but looks fine here
gsutil cat gs://fg-cromwell_fresh/regenie_summaries/29e318d8-92dd-435c-9c41-4c17f200157d/call-post_summary/summary.regenie_summary/164d4497-6b05-4055-b0af-e704f2b4b353/call-gather/coding_variants.txt.gz | zcat | head
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wtf..... dunno how on earth they were shown together... must be some terminal quirkiness. anyhoo
@@ -46,7 +19,7 @@ workflow regenie { | |||
} | |||
} | |||
|
|||
call coding_gather { | |||
input: files=sub_step2.coding | |||
call summary.regenie_summary as post_summary { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh oh this is gonna be shitton of localization.... but the import wdl I meant that we have the 'tasks' i.e. functions there so we don't have the summary task in multiple places, when adding the summarization workflow to pregiven sumstats. I would do the post-processing of single sumstats within the step2. import summary
task from a separate wdl file (I assume that this is possible without having any workflows there???)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is there any excess localization here which wouldn't happen before? Anyways the single sumstats would have to be first localized to the summary task and to put them all together would require localizing them all in the single task, right? Or am I missing something?
Also, I was under the assumption that it is only possible to import and call workflows only, not just the tasks. I'll need to have a look if that works.
Maybe a topic for discussion in the sw meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaa yeah it seems to be possible to call imported tasks directly without first having to call the workflow. Or at least womtool validate seemed to be ok with it. This simplifies things...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea previously the summarization happened in a single phenotype step. Now you are summarizing all of the full summarystats which is gonna be localizing terabyte+ to a single machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want the summary to happen in a single pheno step so then we can just gather the summaries, which is gonna be localizing very small files per pheno
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually since it's calling the subworkflow which has as first step the summary task in a scatter, it doesn't localize them all on a single machine but rather follows the scatter.