Skip to content
forked from KarrLab/bpforms

Unambiguous representation of modified DNA, RNA, and proteins

License

Notifications You must be signed in to change notification settings

ychebaro/bpforms

 
 

Repository files navigation

PyPI package Documentation Test results Test coverage Code analysis License Analytics

BpForms: unambiguous representation of modified DNA, RNA, and proteins

BpForms is a set of tools for unambiguously representing the structures of modified forms of biopolymers such as DNA, RNA, and protein.

  • The BpForms notation can unambiguously represent the structure of modified forms of biopolymers. For example, the following represents a modified DNA molecule that contains a deoxyinosine residue at the fourth position.
    ACG[id: "dI" | structure: InChI=1S/C10H12N4O4/c15-2-6-5(16)1-7(18-6)14-4-13-8-9(14)11-3-12-10(8)17/h3-7,15-16H,1-2H2,(H,11,12,17)/t5-,6+,7+/m0/s1]T
    
  • This concrete representation of modified biopolymers enables the BpForms software tools to calculate the formulae, molecular weights, and charges of biopolymers, as well as automatically protonate biopolymers for specific pHs.

BpForms emcompasses five tools:

BpForms was motivated by the need to concretely represent the biochemistry of DNA modification, DNA repair, post-transcriptional processing, and post-translational processing in whole-cell computational models. In addition, BpForms are a valuable tool for experimental proteomics. In particular, we developed BpForms because there were no notations, schemas, data models, or file formats for concretely representing modified forms of biopolymers, despite the existence of several databases and ontologies of DNA, RNA, and protein modifications and the ProForma Proteoform Notation.

The BpForms syntax was inspired by the ProForma Proteoform Notation. BpForms improves upon this syntax in several ways:

  • BpForms separates the representation of modified biopolymers from the chemical processes which generate them.
  • BpForms clarifies the representation of multiply modified monomers. This is necessary to represent the combinatorial complexity of modified DNA, RNA, and proteins.
  • BpForms can represent any modification and, therefore, is not limited to previously enumerated modifications. This is also necessary to represent the combinatorial complexity of modified DNA, RNA, and proteins.
  • BpForms supports two additional types of uncertainty in the structures of biopolyers: uncertainty in the positions of modifications and uncertainty in the charges of modifications.
  • BpForms has a concrete grammar. This enables error checking, as well the calculation of formulae, masses, and charges which is essential for modeling.

Installation

  1. Install dependencies

    • Open Babel <http://openbabel.org>_
    • Pip <https://pip.pypa.io>_ >= 18.0
    • Python <https://www.python.org>_ >= 3.6
  2. Install this package

    • Install the latest release from PyPI

      pip install bpforms
      
    • Install the latest revision from GitHub

      pip install git+git://github.com/KarrLab/bpforms#egg=bpforms
      

Examples, tutorial, and documentation

Please see the documentation.

License

The package is released under the MIT license.

Development team

This package was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, USA.

Questions and comments

Please contact the Karr Lab with any questions or comments.

About

Unambiguous representation of modified DNA, RNA, and proteins

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%