-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate de novo calls #654
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some questions and minor suggestions
@@ -1015,7 +1017,33 @@ def dense_trio_mt( | |||
version: MatrixTableResource( | |||
f"{get_sample_qc_root(version, test, data_type='exomes')}" | |||
f"/relatedness/trios/gnomad.{data_type}.v{version}.trios" | |||
f"{'.releasable' if releasable else ''}.dense.mt" | |||
f"{'.releasable' if releasable else ''}.dense" | |||
f"{'.split' if split else ''}.mt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this update means that you can no longer easily access the dense but unsplit MT, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they are both there, Julia had both versions saved already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I mean that this won't return .mt
at the end of the path if you only want the dense, unsplit MT, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will, like this:
from gnomad_qc.v4.resources.sample_qc import dense_trio_mt
print(dense_trio_mt().path)
print(dense_trio_mt(split=True).path)
outputing:
gs://gnomad/v4.0/sample_qc/exomes/relatedness/trios/gnomad.exomes.v4.0.trios.releasable.dense.mt
gs://gnomad/v4.0/sample_qc/exomes/relatedness/trios/gnomad.exomes.v4.0.trios.releasable.dense.split.mt
@@ -104,52 +106,64 @@ def filter_de_novos(ht: hl.Table) -> hl.Table: | |||
return ht | |||
|
|||
|
|||
def get_releasable_trios_dense_mt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep this function since it documents how the dense MT was created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not how the dense MT was created, it's here now, and I think we should remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, OK, fine with me to remove if the code is saved elsewhere
logger.info("Filtering to chr20 for testing...") | ||
mt = hl.filter_intervals(mt, [hl.parse_locus_interval("chr20")]) | ||
|
||
mt = mt.annotate_rows( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you adding these annotations because some of these variants weren't present in our release table (and therefore won't have information in our previously calculated allele_info
struct)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julia added this before splitting, since the split table is already there, I kept a record here.
filter_samples_ht=meta_ht, | ||
filter_variant_ht=var_ht, | ||
checkpoint_variant_data=True, | ||
# Approximate the AD and PL fields when missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I asked about this in the other PR -- can you add this code to the gnomAD methods function, or at least add a note that this is a way to approximate AD and PL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, can you add in the comment here that these are missing for a reason, and only for homref genotypes (we dropped them to save on storage space/costs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to be here because it approximate before convert to a trio matrix.
from gnomad.utils.slack import slack_notifications | ||
from gnomad.utils.vep import process_consequences | ||
from hail.methods.family_methods import trio_matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you call this method rather than import it here? https://hail.is/docs/0.2/methods/genetics.html#hail.methods.trio_matrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I didn't know this.
tm = trio_matrix(mt, ped, complete_trios=True) | ||
tm = tm.checkpoint(new_temp_file("trio_matrix", "mt")) | ||
|
||
ht = tm.entries() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need all of the entries? can you drop some to save space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julia already selected entries for the dense MT here, do we want to delete "SB" etc?
Co-authored-by: Katherine Chao <[email protected]>
This cleaned up some redundant dense mt functions and arguments since we had PR #651. Julia created the dense MT in job 0a3e98e75ba14b2f8341012515d11f8b before the PR was merged.
This is waiting on PR #760 in gnomad_methods.
Test run on chr20: fbca7f3460b84ceab02933a644d22cc2