Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate de novo calls #654

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Generate de novo calls #654

wants to merge 6 commits into from

Conversation

KoalaQin
Copy link
Contributor

@KoalaQin KoalaQin commented Jan 29, 2025

This cleaned up some redundant dense mt functions and arguments since we had PR #651. Julia created the dense MT in job 0a3e98e75ba14b2f8341012515d11f8b before the PR was merged.

This is waiting on PR #760 in gnomad_methods.

Test run on chr20: fbca7f3460b84ceab02933a644d22cc2

Copy link
Contributor

@ch-kr ch-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some questions and minor suggestions

@@ -1015,7 +1017,33 @@ def dense_trio_mt(
version: MatrixTableResource(
f"{get_sample_qc_root(version, test, data_type='exomes')}"
f"/relatedness/trios/gnomad.{data_type}.v{version}.trios"
f"{'.releasable' if releasable else ''}.dense.mt"
f"{'.releasable' if releasable else ''}.dense"
f"{'.split' if split else ''}.mt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this update means that you can no longer easily access the dense but unsplit MT, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are both there, Julia had both versions saved already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I mean that this won't return .mt at the end of the path if you only want the dense, unsplit MT, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will, like this:

from gnomad_qc.v4.resources.sample_qc import dense_trio_mt

print(dense_trio_mt().path)
print(dense_trio_mt(split=True).path)

outputing: 
gs://gnomad/v4.0/sample_qc/exomes/relatedness/trios/gnomad.exomes.v4.0.trios.releasable.dense.mt
gs://gnomad/v4.0/sample_qc/exomes/relatedness/trios/gnomad.exomes.v4.0.trios.releasable.dense.split.mt

@@ -104,52 +106,64 @@ def filter_de_novos(ht: hl.Table) -> hl.Table:
return ht


def get_releasable_trios_dense_mt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this function since it documents how the dense MT was created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not how the dense MT was created, it's here now, and I think we should remove them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, OK, fine with me to remove if the code is saved elsewhere

gnomad_qc/v4/create_release/create_de_novo_release.py Outdated Show resolved Hide resolved
gnomad_qc/v4/create_release/create_de_novo_release.py Outdated Show resolved Hide resolved
logger.info("Filtering to chr20 for testing...")
mt = hl.filter_intervals(mt, [hl.parse_locus_interval("chr20")])

mt = mt.annotate_rows(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you adding these annotations because some of these variants weren't present in our release table (and therefore won't have information in our previously calculated allele_info struct)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julia added this before splitting, since the split table is already there, I kept a record here.

filter_samples_ht=meta_ht,
filter_variant_ht=var_ht,
checkpoint_variant_data=True,
# Approximate the AD and PL fields when missing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I asked about this in the other PR -- can you add this code to the gnomAD methods function, or at least add a note that this is a way to approximate AD and PL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, can you add in the comment here that these are missing for a reason, and only for homref genotypes (we dropped them to save on storage space/costs)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to be here because it approximate before convert to a trio matrix.

from gnomad.utils.slack import slack_notifications
from gnomad.utils.vep import process_consequences
from hail.methods.family_methods import trio_matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call this method rather than import it here? https://hail.is/docs/0.2/methods/genetics.html#hail.methods.trio_matrix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I didn't know this.

gnomad_qc/v4/create_release/create_de_novo_release.py Outdated Show resolved Hide resolved
tm = trio_matrix(mt, ped, complete_trios=True)
tm = tm.checkpoint(new_temp_file("trio_matrix", "mt"))

ht = tm.entries()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need all of the entries? can you drop some to save space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julia already selected entries for the dense MT here, do we want to delete "SB" etc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants