Benchmarking Diffusion Models for Monomeric Protein Structure Prediction

This repository contains the code and datasets used in our study on systematically benchmarking state-of-the-art AI-powered diffusion models for monomeric protein structure prediction. Our analysis focuses on three leading models—AlphaFold 3, Protenix, and Chai-1—evaluating their accuracy, robustness, and ability to detect subtle 3D structural variations in unseen protein structures.

Overview

Protein structure prediction is crucial for advancing computational biology, with applications ranging from drug design to understanding biological mechanisms. Recent advancements in diffusion models have revolutionized the field, enabling faster and more accurate predictions. This project explores:

The predictive performance of AlphaFold 3, Protenix, and Chai-1 on test sets of unseen proteins.
Sensitivity of AlphaFold 2 to single amino acid mutations, analyzing local structural deformation.
Comparative performance across difficulty levels using metrics like pLDDT, PAE, RMSD, and pTM.

Key Features

Dataset: Includes newly released protein structures from CAMEO (2024) and experimental datasets (https://zenodo.org/records/10013253) with mutation-specific comparisons.
Metrics: Comprehensive evaluation using confidence scores, alignment errors, and structural deviations.
Code: Modular implementation for benchmarking diffusion-based prediction models.

Metrics Analysis

This repository is designed for detailed analysis of metrics using the available data and tools. Follow the steps below to install dependencies and run the Jupyter notebook.

Requirements

To install the necessary dependencies, ensure you have pip or conda installed in your environment. Then, run the following command:

With `pip`

pip install -r requirements.txt

Running the Notebook

Download the dataset from this link, unzip and place it in data folder.
Download the dataset from this link, extract the "PDB" folder and place it in data folder.
Install the required dependencies. Make sure you have all the prerequisites set up, as outlined in the requirements.txt file.
Open the playground.ipynb notebook in the Jupyter web interface.
Run the cells in the notebook to analyze the metrics and explore the results. Each cell is designed to guide you through the analysis step by step.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.idea		.idea
data		data
images		images
utils		utils
.gitignore		.gitignore
README.md		README.md
playground.ipynb		playground.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking Diffusion Models for Monomeric Protein Structure Prediction

Overview

Key Features

Metrics Analysis

Requirements

With `pip`

Running the Notebook

About

Releases

Packages

Contributors 2

Languages

joansaurina/Protein_Structure_Prediction

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Diffusion Models for Monomeric Protein Structure Prediction

Overview

Key Features

Metrics Analysis

Requirements

With pip

Running the Notebook

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

With `pip`

Packages