-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: flag and filter credible sets #879
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good Tobi, thanks! Have a look at my suggestion, I think it is more performant
Thank you for the tests
@@ -45,6 +45,8 @@ def __init__( | |||
.annotate_study_type(study_index) # Add study type to study locus | |||
.qc_redundant_top_hits_from_PICS() # Flagging top hits from studies with PICS summary statistics | |||
.qc_explained_by_SuSiE() # Flagging credible sets in regions explained by SuSiE | |||
# Flagging credible sets with PIP > 1 or PIP < 0.99 | |||
.qc_abnormal_pips(sum_pips_lower_threshold=0.99,sum_pips_upper_threshold=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that we are not dropping credible sets due to the lower bound, only the upper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both below lower or above the upper bound are dropped, in theory - in practice I'm not so sure
Do you have an idea of how many CS have this issue? |
When I last looked at gwas catalog susie I didn't find any credible sets with the issue. I guess @addramir would know more about where this has been a problem in the past |
I would be happy if 0 CSs are having this issue. If not - we have a problem. |
…unt for floating point errors
Implemented @ireneisdoomed's suggestion and after cross-checking vs gwas catalog susie I tweaked the threshold to be 1.0001 rather than 1 as there were some sums creeping through likely due to floating point error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
src/gentropy/dataset/study_locus.py
Outdated
# Flagging loci with failed studies: | ||
.withColumn( | ||
"qualityControls", | ||
self.update_quality_flag( | ||
qc_select_expression, | ||
f.col("sumPosteriorProbability").isNotNull(), | ||
f.col("pipOutOfRange") == "outside", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this could be a boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! Done!
✨ Context
This PR adds a study locus QC flag for when the sum of posterior inclusion probabilities are not in the expected range [0.99,1] addresses issues/#3566
🛠 What does this PR implement
🙈 Missing
🚦 Before submitting
dev
branch?make test
)?poetry run pre-commit run --all-files
)?