Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor summary edits #11

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Minor summary edits #11

wants to merge 3 commits into from

Conversation

juhaa
Copy link
Contributor

@juhaa juhaa commented Dec 1, 2021

  • Added sorting & tabixing of coding variants
  • Moved the whole summary part to a subworkflow
  • Update the config json with R8 file paths

header.extend(["fin.AF","nfsee.AF","rsid","gene_most_severe","most_severe"])
summary_outfile.write("\t".join(header)+"\n")
#add pheno, fin enrichment
header.extend(["fin.enrichment","phenotype"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last 2 columns are merged iin the provided results, but looks fine here
gsutil cat gs://fg-cromwell_fresh/regenie_summaries/29e318d8-92dd-435c-9c41-4c17f200157d/call-post_summary/summary.regenie_summary/164d4497-6b05-4055-b0af-e704f2b4b353/call-gather/coding_variants.txt.gz | zcat | head

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you mean, seems fine to me:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wtf..... dunno how on earth they were shown together... must be some terminal quirkiness. anyhoo

@@ -46,7 +19,7 @@ workflow regenie {
}
}

call coding_gather {
input: files=sub_step2.coding
call summary.regenie_summary as post_summary {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh oh this is gonna be shitton of localization.... but the import wdl I meant that we have the 'tasks' i.e. functions there so we don't have the summary task in multiple places, when adding the summarization workflow to pregiven sumstats. I would do the post-processing of single sumstats within the step2. import summary task from a separate wdl file (I assume that this is possible without having any workflows there???)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is there any excess localization here which wouldn't happen before? Anyways the single sumstats would have to be first localized to the summary task and to put them all together would require localizing them all in the single task, right? Or am I missing something?

Also, I was under the assumption that it is only possible to import and call workflows only, not just the tasks. I'll need to have a look if that works.

Maybe a topic for discussion in the sw meeting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaa yeah it seems to be possible to call imported tasks directly without first having to call the workflow. Or at least womtool validate seemed to be ok with it. This simplifies things...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea previously the summarization happened in a single phenotype step. Now you are summarizing all of the full summarystats which is gonna be localizing terabyte+ to a single machine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want the summary to happen in a single pheno step so then we can just gather the summaries, which is gonna be localizing very small files per pheno

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually since it's calling the subworkflow which has as first step the summary task in a scatter, it doesn't localize them all on a single machine but rather follows the scatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants