Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D clustering on protein complexes #52

Open
St3451 opened this issue Nov 20, 2024 · 1 comment
Open

3D clustering on protein complexes #52

St3451 opened this issue Nov 20, 2024 · 1 comment
Labels
discussion Require discussion enhancement New feature or request

Comments

@St3451
Copy link
Collaborator

St3451 commented Nov 20, 2024

This is just an idea that I leave here for the future.
Recently, it was released AlphaFold 3 that can predict the protein structure of complexes. I am not totally aware of the complete availability of this data yet but I am confident that soon or later it will be available and hosted in a database (e.g., EBI AF DB) or that it can be easily generated (e.g., Boltz-1).

One possible way of exploiting this new source of information is by adding a 3D clustering analysis on complexes to Oncodrive3D.
For example, we could run this analysis as an independent analysis from the one made on individual protein (to avoid affecting the result on individual protein by having a more penalized FDR). We would need to map any mutation in a gene to the structure of every possible complex including that gene product. For the mutation profile, we would need to concatenate the profile of each protein following the protein sequence in the structure of the complex. We could use the same seq_df we use for individual protein to get the miss_mut_prob vector of the complex (this step might need some tweaking). Then, we would run clustering on any complex as it is an individual protein. It would probably make sense to consider only clusters having mutations shared between the two or more protein of the complex (basically only looking for clusters in regions of contact/interactions between the proteins). Finally, we could output this result as an additional output specific for protein complexes (e.g., <cohort>.3d_clustering_complexes_pos.csv and <cohort>.3d_clustering_complexes_gene.csv).

Other than providing useful insight and additional detection power (need to discuss how we could integrate the indivudual gene result with the complexes result), it would be super useful for the interpretation and detection of driver mutations (BoostDM or BoostDM-3D).
For example, to enhance the prediction (and interpretation) of driver mutations, we could use as feature the presence of a general cluster in a complex (e.g., with any other protein). Or we could encode multiple features (that could be selected for each gene) with cluster in specific complexes (target gene-TP53, etc).

@St3451 St3451 added discussion Require discussion enhancement New feature or request labels Nov 20, 2024
@St3451 St3451 changed the title Complexes 3D clustering on protein complexes Nov 20, 2024
@St3451
Copy link
Collaborator Author

St3451 commented Jan 12, 2025

@koszulordie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Require discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant