You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is just an idea that I leave here for the future.
Recently, it was released AlphaFold 3 that can predict the protein structure of complexes. I am not totally aware of the complete availability of this data yet but I am confident that soon or later it will be available and hosted in a database (e.g., EBI AF DB) or that it can be easily generated (e.g., Boltz-1).
One possible way of exploiting this new source of information is by adding a 3D clustering analysis on complexes to Oncodrive3D.
For example, we could run this analysis as an independent analysis from the one made on individual protein (to avoid affecting the result on individual protein by having a more penalized FDR). We would need to map any mutation in a gene to the structure of every possible complex including that gene product. For the mutation profile, we would need to concatenate the profile of each protein following the protein sequence in the structure of the complex. We could use the same seq_df we use for individual protein to get the miss_mut_prob vector of the complex (this step might need some tweaking). Then, we would run clustering on any complex as it is an individual protein. It would probably make sense to consider only clusters having mutations shared between the two or more protein of the complex (basically only looking for clusters in regions of contact/interactions between the proteins). Finally, we could output this result as an additional output specific for protein complexes (e.g., <cohort>.3d_clustering_complexes_pos.csv and <cohort>.3d_clustering_complexes_gene.csv).
Other than providing useful insight and additional detection power (need to discuss how we could integrate the indivudual gene result with the complexes result), it would be super useful for the interpretation and detection of driver mutations (BoostDM or BoostDM-3D).
For example, to enhance the prediction (and interpretation) of driver mutations, we could use as feature the presence of a general cluster in a complex (e.g., with any other protein). Or we could encode multiple features (that could be selected for each gene) with cluster in specific complexes (target gene-TP53, etc).
The text was updated successfully, but these errors were encountered:
This is just an idea that I leave here for the future.
Recently, it was released AlphaFold 3 that can predict the protein structure of complexes. I am not totally aware of the complete availability of this data yet but I am confident that soon or later it will be available and hosted in a database (e.g., EBI AF DB) or that it can be easily generated (e.g., Boltz-1).
One possible way of exploiting this new source of information is by adding a 3D clustering analysis on complexes to Oncodrive3D.
For example, we could run this analysis as an independent analysis from the one made on individual protein (to avoid affecting the result on individual protein by having a more penalized FDR). We would need to map any mutation in a gene to the structure of every possible complex including that gene product. For the mutation profile, we would need to concatenate the profile of each protein following the protein sequence in the structure of the complex. We could use the same
seq_df
we use for individual protein to get themiss_mut_prob
vector of the complex (this step might need some tweaking). Then, we would run clustering on any complex as it is an individual protein. It would probably make sense to consider only clusters having mutations shared between the two or more protein of the complex (basically only looking for clusters in regions of contact/interactions between the proteins). Finally, we could output this result as an additional output specific for protein complexes (e.g.,<cohort>.3d_clustering_complexes_pos.csv
and<cohort>.3d_clustering_complexes_gene.csv
).Other than providing useful insight and additional detection power (need to discuss how we could integrate the indivudual gene result with the complexes result), it would be super useful for the interpretation and detection of driver mutations (BoostDM or BoostDM-3D).
For example, to enhance the prediction (and interpretation) of driver mutations, we could use as feature the presence of a general cluster in a complex (e.g., with any other protein). Or we could encode multiple features (that could be selected for each gene) with cluster in specific complexes (target gene-TP53, etc).
The text was updated successfully, but these errors were encountered: