-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require filtered annotations in filtered genes #4664
base: dev
Are you sure you want to change the base?
Conversation
@@ -958,7 +964,7 @@ def _get_allowed_consequence_ids(self, annotations): | |||
|
|||
def _get_allowed_transcripts(self, ht, allowed_consequence_ids): | |||
transcript_filter = self._get_allowed_transcripts_filter(allowed_consequence_ids) | |||
return ht[self.TRANSCRIPTS_FIELD].filter(transcript_filter) | |||
return getattr(ht, FILTERED_GENE_TRANSCRIPTS, ht[self.TRANSCRIPTS_FIELD]).filter(transcript_filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the main change for the desired functionality in this PR
@@ -1001,7 +1007,7 @@ def _filter_compound_hets(self): | |||
def key(v): | |||
ks = [v[k] for k in self.KEY_FIELD] | |||
return ks[0] if len(self.KEY_FIELD) == 1 else hl.tuple(ks) | |||
ch_ht = ch_ht.annotate(key_=key(ch_ht.row), gene_ids=self._gene_ids_expr(ch_ht)) | |||
ch_ht = ch_ht.annotate(key_=key(ch_ht.row), gene_ids=self._gene_ids_expr(ch_ht, filtered_genes_only=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change means that if a gene filter has been applied we will only compute possible comp het pairs in those genes
@@ -84,11 +85,44 @@ def get_allowed_sv_type_ids(self, sv_types): | |||
type.replace(self.SV_TYPE_PREFIX, '') for type in sv_types if type.startswith(self.SV_TYPE_PREFIX) | |||
]) | |||
|
|||
def _annotate_families_table_annotations(self, families_ht, annotations_ht, is_comp_het=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For gCNV data, we have an entry-level gene_ids annotation, as the breakpoints for the SV in the annotation table can be different than for an individual call. The changes in this file ensure that gene and annotation filters only consider the genes present in the entries, and not all the gene. This function is added because we also recently added the logic to do an annotation-first search if genes are preent, which means at the time we do the gene and annotation filtering we don;t have the entries available, so we need to add a post-processing step to re-filter based on the filtered genes in that case
|
||
return hl.or_else(matched_transcript, main_transcript) | ||
main_transcript = getattr(ht, FILTERED_GENE_TRANSCRIPTS, ht.sorted_transcript_consequences).first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how the addition of filtering transcripts by genes earlier on makes selecting the main transcript a little simpler
No description provided.