New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Adapt `hl.de_novo` function #760

Open

KoalaQin wants to merge 7 commits into main from qh/de_novo

Contributor

KoalaQin commented Jan 28, 2025 •

edited

Loading

These are smaller functions modified from Julia's work on combining hl.de_novo and Kaitlin’s code.

KoalaQin added 3 commits

January 21, 2025 17:55


          Add function to compute post probability of de novos

f24e6a5


          confidence and fail check function

3f3f8e3


          Modify de novo function

e681bdc

KoalaQin self-assigned this

KoalaQin requested a review from ch-kr

January 28, 2025 20:29

KoalaQin assigned ch-kr

KoalaQin added 2 commits

January 29, 2025 09:52


          black formatting

4ee8cdc


          Reformat docstring

376811d

KoalaQin mentioned this pull request

Generate de novo calls broadinstitute/gnomad_qc#654

Open


          Change the citation

f80698c

ch-kr requested changes

View reviewed changes

Contributor

ch-kr left a comment

thanks for adding these functions! I have some questions (many because I'm not that familiar with this work) and suggestions

gnomad/sample_qc/relatedness.py Show resolved Hide resolved

gnomad/sample_qc/relatedness.py Show resolved Hide resolved

gnomad/sample_qc/relatedness.py

		)


		def transform_pl_to_pp(pl_expr: hl.expr.ArrayExpression) -> hl.expr.ArrayExpression:

Contributor

ch-kr Jan 29, 2025

naive question, is the pp here posterior probability?

Contributor Author

KoalaQin Jan 30, 2025

It's the conditional probabilities of each geneotype given the data calculated by HaplotypeCaller, we're transforming it back.

gnomad/sample_qc/relatedness.py Outdated Show resolved Hide resolved

gnomad/sample_qc/relatedness.py Outdated Show resolved Hide resolved

gnomad/sample_qc/relatedness.py

+                          "HIGH",
+                      )
+                      .when((p_de_novo > med_conf_p) & (proband_ab > high_med_conf_ab), "MEDIUM")
+                      .when((p_de_novo > low_conf_p) & (proband_ab >= low_conf_ab), "LOW")

Contributor

ch-kr Jan 29, 2025

I didn't dig into Kaitlin's code, but her documentation uses > and not >= for AB

p_dn > 0.05 and child_AD > 0.2

Contributor Author

KoalaQin Jan 30, 2025

yeah I knew that, but <0.2 is failing, what. happens with AD = 0.2?

gnomad/sample_qc/relatedness.py

+                      de_novo_prior=de_novo_prior,
+                  )
+                  # Determine genomic context

Contributor

ch-kr Jan 29, 2025

rather than running this in this function and in calculate_de_novo_post_prob, why not switch the order and run get_genomic_context first and pass in the three return values to calculate_de_novo_post_prob as function arguments?

gnomad/sample_qc/relatedness.py

+                      .or_missing()
+                  )
+                  parent_sum_ad_0 = (

Contributor

ch-kr Jan 29, 2025

these aren't sums; should this be renamed parent_ad_0, parent_ad_0_check, or something else that's similar?

Contributor Author

KoalaQin Jan 30, 2025

great catch, it should be hl.sum()

gnomad/sample_qc/relatedness.py

+                      "min_dp_ratio": dp_ratio < min_dp_ratio,
+                      "parent_sum_ad_0": parent_sum_ad_0,
+                      "max_parent_ab": fail_max_parent_ab,
+                      "min_proband_ab": proband_ab < min_proband_ab,

Contributor

ch-kr Jan 29, 2025

should this be <= instead of <?

Contributor Author

KoalaQin Jan 30, 2025

Okay, I will change it here.

gnomad/sample_qc/relatedness.py

+                      locus_expr, is_female_expr
+                  )
+                  is_de_novo = (

Contributor

ch-kr Jan 29, 2025

the current setup means you're calculating this probability on all variants, right? could you filter to variants that are eligible for being de novos first and then calculate probabilities? or is there a reason you don't want to do this filter upfront?

Contributor

ch-kr commented Jan 29, 2025

I also forgot to ask in my initial review -- could you add tests for the new functions?


          Apply suggestions from code review

ddf3811

Co-authored-by: Katherine Chao <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet