Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require filtered annotations in filtered genes #4664

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from

Conversation

hanars
Copy link
Collaborator

@hanars hanars commented Feb 20, 2025

No description provided.

@hanars hanars changed the title Gene annotation filter Require filtered annotations in filtered genes Feb 20, 2025
@@ -958,7 +964,7 @@ def _get_allowed_consequence_ids(self, annotations):

def _get_allowed_transcripts(self, ht, allowed_consequence_ids):
transcript_filter = self._get_allowed_transcripts_filter(allowed_consequence_ids)
return ht[self.TRANSCRIPTS_FIELD].filter(transcript_filter)
return getattr(ht, FILTERED_GENE_TRANSCRIPTS, ht[self.TRANSCRIPTS_FIELD]).filter(transcript_filter)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main change for the desired functionality in this PR

@@ -1001,7 +1007,7 @@ def _filter_compound_hets(self):
def key(v):
ks = [v[k] for k in self.KEY_FIELD]
return ks[0] if len(self.KEY_FIELD) == 1 else hl.tuple(ks)
ch_ht = ch_ht.annotate(key_=key(ch_ht.row), gene_ids=self._gene_ids_expr(ch_ht))
ch_ht = ch_ht.annotate(key_=key(ch_ht.row), gene_ids=self._gene_ids_expr(ch_ht, filtered_genes_only=True))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change means that if a gene filter has been applied we will only compute possible comp het pairs in those genes

@@ -84,11 +85,44 @@ def get_allowed_sv_type_ids(self, sv_types):
type.replace(self.SV_TYPE_PREFIX, '') for type in sv_types if type.startswith(self.SV_TYPE_PREFIX)
])

def _annotate_families_table_annotations(self, families_ht, annotations_ht, is_comp_het=False):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For gCNV data, we have an entry-level gene_ids annotation, as the breakpoints for the SV in the annotation table can be different than for an individual call. The changes in this file ensure that gene and annotation filters only consider the genes present in the entries, and not all the gene. This function is added because we also recently added the logic to do an annotation-first search if genes are preent, which means at the time we do the gene and annotation filtering we don;t have the entries available, so we need to add a post-processing step to re-filter based on the filtered genes in that case


return hl.or_else(matched_transcript, main_transcript)
main_transcript = getattr(ht, FILTERED_GENE_TRANSCRIPTS, ht.sorted_transcript_consequences).first()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how the addition of filtering transcripts by genes earlier on makes selecting the main transcript a little simpler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants