Skip to content

Latest commit

 

History

History
75 lines (55 loc) · 5.98 KB

README.md

File metadata and controls

75 lines (55 loc) · 5.98 KB

CryptoBench: A comprehensive dataset of protein-ligand cryptic binding sites

license DOI PubMed Dataset

Protein cryptic binding sites are sites that are spatially malformed or inaccessible in their unbound state but become visible through some external factor, such as ligand binding (see the example below). Identifying these sites is important in many applications from bioengineering to drug discovery. CryptoBench is a large-scale dataset designed to aid in the development and evaluation of new cryptic binding site prediction methods.

Illustration of the unbound state of Cobyrinic acid a,c diamide synthase (PDB ID: 4PFS), with obscured binding site. The ligand has been artificially added to highlight that it does not fit into the pocket in this state. Illustration of the bound state of the same protein (PDB ID: 5IF9) shows that there actually exists a (cryptic) binding site.

About

CryptoBench contains over 1,000 structures making it substantially larger than any dataset available before. It can be used for training novel cryptic binding site prediction methods as it was demonstrated by training protein language model-based baseline method within the CryptoBench manuscript.

The complete CryptoBench dataset, including train-test splits, CIF files, and PyMOL visualization scripts, is available on the OSF framework.

Tutorial

To facilitate working with CryptoBench, we offer a tutorial/tutorial.ipynb notebook. This tutorial provides step-by-step guidance for parsing, handling train-test splits, and visualizing data within the dataset.

Overview

  1. For details on the dataset construction process and potential reproduction purposes, please refer to the src/README.md.
  2. A framework from this repository was used to train the benchmark method.
  3. Since the original PocketMiner code required minor adjustments to work, the forked PocketMiner repository, along with steps on how PocketMiner was evaluated on the CryptoBench test set, can be found here.

Benchmark method results

In the CryptoBench study, we evaluated the performance of three methods on the CryptoBench test set: the newly developed benchmark method (pLM-NN), PocketMiner, and P2Rank.

Method Dataset AUC AUPRC ACC FPR TPR MCC F1 Score
pLM-NN CB-full 1 0.86 0.36 0.93 0.05 0.48 0.39 0.92
pLM-NN CB-PM 2 0.88 0.43 0.93 0.04 0.52 0.44 0.93
PocketMiner CB-PM 0.76 0.19 0.82 0.16 0.51 0.22 0.78
pLM-NN CB-P2RANK-apo 3 0.88 0.42 0.93 0.04 0.51 0.43 0.93
P2RANK CB-P2RANK-apo 0.81 0.21 0.85 0.14 0.62 0.27 0.81
P2RANK CB-P2RANK-holo 4 0.89 0.34 0.85 0.15 0.84 0.38 0.81

If you would like to evaluate your method using this dataset or compare your predictions with the benchmark, feel free to reach out! You can contact us via GitHub Issues or by email, which can be found in the paper.

How to cite:

If you use CryptoBench, please cite the paper:

  • Vít Škrhák, Marian Novotný, Christos P Feidakis, Radoslav Krivák, David Hoksza, CryptoBench: cryptic protein–ligand binding sites dataset and benchmark, Bioinformatics, Volume 41, Issue 1, January 2025, btae745, https://doi.org/10.1093/bioinformatics/btae745

or, if you prefer the BibTeX format:

@article{skrhak2024cryptobench,
    author = {Škrhák, Vít and Novotný, Marian and Feidakis, Christos P and Krivák, Radoslav and Hoksza, David},
    title = {CryptoBench: Cryptic protein-ligand binding sites dataset and benchmark},
    journal = {Bioinformatics},
    pages = {btae745},
    year = {2024},
    month = {12},
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btae745},
    url = {https://doi.org/10.1093/bioinformatics/btae745},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btae745/61228599/btae745.pdf},
}

Contact us

If you have any questions regarding the usage of the dataset or its assembly, comparing your method against the benchmark, or if you have any suggestions, please feel free to contact us by raising an issue!

License

This source code is licensed under the MIT license.

Footnotes

  1. CB-full denotes the whole CryptoBench test set.

  2. CB-PM denotes the subset on which PocketMiner was evaluated

  3. CB-P2RANK-apo denotes the subset on which P2Rank was evaluated

  4. CB-P2RANK-holo denotes the holo counterparts of the CB-P2RANK-apo subset.