BioNLPDatasets

Repo for Bio NLP Resources

BC5CDR: BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions.
CDT: The weakly-labeled corpus used in (Peng et al., 2016) consists of 18,410 abstracts and 33,224 CID relations. The raw data was extracted from curated data in the CTD-Pfizer collaboration with document-level annotations of drug-disease and drug-phenotype interactions.

Chemical and Drug

BioCreative IV Chemical and Drug (BC4CHEMD)

Relation Extraction

Gene-Disease

GAD: The Genetic Association Database (GAD) is an archive of human genetic association studies of complex diseases, including summary data extracted from publications on candidate gene and GWAS studies. We use GAD for the development of a corpus on associations between genes and diseases (downloaded on January 21st, 2013).
EU-ADR: The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts.

Chemical-Protein

ChemProt:

Protein-Protein

PPI: This is a new, and much improved, binarization of BioInfer as reported in Heimonen et al., Complex-to-Pairwise Mapping of Biological Relationships using a Semantic Network Representation.

Drug-Drug Interaction

DDIExtraction2013:

Drug-ADE

ADE: Development of a benchmark corpus to support the automatic extractionof drug-related adverse effects from medical case reports
TAC2017: The DDIExtraction2013 Shared Task focuses on extraction of drug-drug interactions.
SMM4H: Fourth Social Media Mining for Health (#SMM4H) Shared Task at ACL 2019
ADRMine: Corpus from Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts

Large Scale Pubmed Corpus

Pubmed

Pubmed Phrases: The dataset contains a collection of 705,915 PubMed Phrases (Kim et al., 2018) that are beneficial for information retrieval and human comprehension.

Useful Links

http://biocreative.sourceforge.net/bio_corpora_links.html

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioNLPDatasets

Contents

Named Entity Recognition

Disease

Mutation Mentions of various kinds (Protein, DNA...)

Chemical Disease Interaction

Chemical and Drug

Relation Extraction

Gene-Disease

Chemical-Protein

Protein-Protein

Drug-Drug Interaction

Drug-ADE

Large Scale Pubmed Corpus

Pubmed

Useful Links

About

Releases

Packages

isabelline/BioNLPDatasets

Folders and files

Latest commit

History

Repository files navigation

BioNLPDatasets

Contents

Named Entity Recognition

Disease

Mutation Mentions of various kinds (Protein, DNA...)

Chemical Disease Interaction

Chemical and Drug

Relation Extraction

Gene-Disease

Chemical-Protein

Protein-Protein

Drug-Drug Interaction

Drug-ADE

Large Scale Pubmed Corpus

Pubmed

Useful Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages