Skip to content

Scripts for the creation of CryptoBench, a new dataset of cryptic binding sites

License

Notifications You must be signed in to change notification settings

skrhakv/CryptoBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CryptoBench: A comprehensive dataset of protein-ligand cryptic binding sites

license DOI PubMed Dataset

Protein cryptic binding sites are sites that are spatially malformed or inaccessible in their unbound state but become visible through some external factor, such as ligand binding (see the example below). Identifying these sites is important in many applications from bioengineering to drug discovery. CryptoBench is a large-scale dataset designed to aid in the development and evaluation of new cryptic binding site prediction methods.

Illustration of the unbound state of Cobyrinic acid a,c diamide synthase (PDB ID: 4PFS), with obscured binding site. The ligand has been artificially added to highlight that it does not fit into the pocket in this state. Illustration of the bound state of the same protein (PDB ID: 5IF9) shows that there actually exists a (cryptic) binding site.

About

CryptoBench contains over 1,000 structures making it substantially larger than any dataset available before. It can be used for training novel cryptic binding site prediction methods as it was demonstrated by training protein language model-based baseline method within the CryptoBench manuscript.

The complete CryptoBench dataset, including train-test splits, CIF files, and PyMOL visualization scripts, is available on the OSF framework.

Tutorial

To facilitate working with CryptoBench, we offer a tutorial/tutorial.ipynb notebook. This tutorial provides step-by-step guidance for parsing, handling train-test splits, and visualizing data within the dataset.

Overview

  1. For details on the dataset construction process and potential reproduction purposes, please refer to the src/README.md.
  2. A framework from this repository was used to train the benchmark method.

How to cite:

If you use CryptoBench, please cite the paper:

  • Vít Škrhák, Marian Novotný, Christos P Feidakis, Radoslav Krivák, David Hoksza, CryptoBench: cryptic protein–ligand binding sites dataset and benchmark, Bioinformatics, Volume 41, Issue 1, January 2025, btae745, https://doi.org/10.1093/bioinformatics/btae745

or, if you prefer the BibTeX format:

@article{skrhak2024cryptobench,
    author = {Škrhák, Vít and Novotný, Marian and Feidakis, Christos P and Krivák, Radoslav and Hoksza, David},
    title = {CryptoBench: Cryptic protein-ligand binding sites dataset and benchmark},
    journal = {Bioinformatics},
    pages = {btae745},
    year = {2024},
    month = {12},
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btae745},
    url = {https://doi.org/10.1093/bioinformatics/btae745},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btae745/61228599/btae745.pdf},
}

Contact us

If you have any questions regarding the usage of the dataset or its assembly, comparing your method against the benchmark, or if you have any suggestions, please feel free to contact us by raising an issue!

License

This source code is licensed under the MIT license.

About

Scripts for the creation of CryptoBench, a new dataset of cryptic binding sites

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published