Skip to content

Commit

Permalink
Update abstract
Browse files Browse the repository at this point in the history
  • Loading branch information
rcalef committed Dec 9, 2024
1 parent 79f132c commit fddab5f
Showing 1 changed file with 6 additions and 13 deletions.
19 changes: 6 additions & 13 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -229,19 +229,12 @@ <h2 class="subtitle has-text-left is-size-5 has-text-weight-normal">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
Understanding the roles of human proteins remains a major challenge, with approximately 20%
of human proteins lacking known functions and over 40% missing context-specific functional insights.
Even well-annotated proteins are often poorly characterized across diverse biological contexts, disease states, and perturbations.
We present ProCyon, a foundation model for modeling, generating, and predicting protein phenotypes across five interrelated knowledge
domains: molecular functions, therapeutic mechanisms, disease associations, functional protein domains, and molecular interactions.
By analyzing millions of human protein phenotypes, ProCyon integrates proteins with phenotype text prompts through co-training a large
language model with protein encoders. ProCyon processes interleaved phenotypes and protein data, enables zero-shot phenotype annotation,
phenotype generation, and protein retrieval from natural language prompts.
ProCyon achieves substantial performance gains in zero-shot tasks and phenotype generation across a range of applications, including
identifying drug-binding domains for MGAM, predicting peptide binding with the ACE2 enzyme, and assessing the functional impact of
Alzheimer's disease variants. It enables conditional retrieval of proteins associated with small molecule drugs through complementary
mechanisms of action and generates candidate phenotypes for under-characterized proteins linked to Parkinson's disease. ProCyon provides a
valuable tool for functional protein biology.
Understanding the roles of human proteins remains a major challenge, with approximately 20\% of human proteins lacking known functions and more than 40\% missing context-specific functional insights. Even well-annotated proteins are often poorly characterized in diverse biological contexts, disease states, and perturbations.
We present ProCyon, a foundation model for modeling, generating, and predicting protein phenotypes across five interrelated knowledge domains: molecular functions, therapeutic mechanisms, disease associations, functional protein domains, and molecular interactions. To support this, we created \datasetname, a dataset of 33 million protein phenotype instructions, representing a comprehensive resource for multiscale protein phenotypes.
By co-training a large language model with multimodal molecular encoders, ProCyon integrates phenotypic and protein data. A novel architecture and instruction tuning strategy allow ProCyon to process arbitrarily interleaved protein-and-phenotype inputs, achieve zero-shot task transfer, and generate free-form text phenotypes interleaved with retrieved protein sequence, structure, and drug modalities in a single unified model.
ProCyon achieves strong performance against single-modality models, multimodal models such as ESM3, as well as text-only LLMs on dozens of benchmarking tasks such as contextual protein retrieval and question answering.
We extensively evaluate ProCyon for biological applications, including identifying protein domains that bind small molecule drugs, predicting peptide binding with enzymes, and assessing the functional impact of Alzheimer's disease mutations. ProCyon enables conditional retrieval of proteins linked to small molecules through complementary mechanisms of action. It generates candidate phenotypes for under-characterized proteins recently implicated in Parkinson's disease, facilitating hypothesis generation for poorly understood proteins and biological processes.
ProCyon paves the way toward an effective, general solution for functional protein biology that can enable new insights into the human proteome.
</p>
</div>
</div>
Expand Down

0 comments on commit fddab5f

Please sign in to comment.