Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt hl.de_novo function #760

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Adapt hl.de_novo function #760

wants to merge 7 commits into from

Conversation

KoalaQin
Copy link
Contributor

@KoalaQin KoalaQin commented Jan 28, 2025

These are smaller functions modified from Julia's work on combining hl.de_novo and Kaitlin’s code.

@KoalaQin KoalaQin self-assigned this Jan 28, 2025
@KoalaQin KoalaQin requested a review from ch-kr January 28, 2025 20:29
Copy link
Contributor

@ch-kr ch-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding these functions! I have some questions (many because I'm not that familiar with this work) and suggestions

gnomad/sample_qc/relatedness.py Show resolved Hide resolved
gnomad/sample_qc/relatedness.py Show resolved Hide resolved
)


def transform_pl_to_pp(pl_expr: hl.expr.ArrayExpression) -> hl.expr.ArrayExpression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naive question, is the pp here posterior probability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the conditional probabilities of each geneotype given the data calculated by HaplotypeCaller, we're transforming it back.

gnomad/sample_qc/relatedness.py Outdated Show resolved Hide resolved
gnomad/sample_qc/relatedness.py Outdated Show resolved Hide resolved
"HIGH",
)
.when((p_de_novo > med_conf_p) & (proband_ab > high_med_conf_ab), "MEDIUM")
.when((p_de_novo > low_conf_p) & (proband_ab >= low_conf_ab), "LOW")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't dig into Kaitlin's code, but her documentation uses > and not >= for AB

p_dn > 0.05 and child_AD > 0.2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I knew that, but <0.2 is failing, what. happens with AD = 0.2?

de_novo_prior=de_novo_prior,
)

# Determine genomic context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than running this in this function and in calculate_de_novo_post_prob, why not switch the order and run get_genomic_context first and pass in the three return values to calculate_de_novo_post_prob as function arguments?

.or_missing()
)

parent_sum_ad_0 = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these aren't sums; should this be renamed parent_ad_0, parent_ad_0_check, or something else that's similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch, it should be hl.sum()

"min_dp_ratio": dp_ratio < min_dp_ratio,
"parent_sum_ad_0": parent_sum_ad_0,
"max_parent_ab": fail_max_parent_ab,
"min_proband_ab": proband_ab < min_proband_ab,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be <= instead of <?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will change it here.

locus_expr, is_female_expr
)

is_de_novo = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current setup means you're calculating this probability on all variants, right? could you filter to variants that are eligible for being de novos first and then calculate probabilities? or is there a reason you don't want to do this filter upfront?

@ch-kr
Copy link
Contributor

ch-kr commented Jan 29, 2025

I also forgot to ask in my initial review -- could you add tests for the new functions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants