Skip to content

Latest commit

 

History

History
100 lines (73 loc) · 3.48 KB

README.md

File metadata and controls

100 lines (73 loc) · 3.48 KB

BioCreative VI — Track 5: text mining chemical-protein interactions (ChemProt)

This code presents our system for the ChemProt task.

Requirements

Ubuntu, Python 3.6.4. Install the required packages:

$ pip install -r requirements.txt

Usage

Scripts

confusion.py: Calculate the confusion matrix and other statistics given a file with predicted relations.

create_embeddings.py: Create pre-trained part-of-speech and dependency embedding vectors.

main.py: Train a deep learning model and test it. The deep learning model can be a bidirectional long short-term memory (BiLSTM) recurrent network or a convolutional neural network (CNN). It is necessary to edit the script to choose the different input arguments. Only the seed number can be passed by command line:

$ python main.py 2

mfuncs.py: Functions used by the main.py script.

support.py: Auxiliary code to treat the ChemProt dataset.

utils.py: General use utilities.

voting.py: Average several outputs (probabilities). Edit the script to choose the input directory and the group to be evaluated.

Datasets

The datasets were pre-processed (tokenization, sentence splitting, part-of-speech tagging, and dependency parsing) by the Turku Event Extraction System (TEES). Available for download as data.zip [Mirror 1] [Mirror 2]:

Word embeddings

Our word embedding models were created from PubMed English abstracts. We also pre-trained part-of-speech and dependency embedding vectors from the ChemProt dataset. Available for download as word2vec.zip [Mirror 1] [Mirror 2].

We also tested the word embeddings model created by Chen et al. (2018) [Paper] [Code].

Supplementary data

Statistics about the datasets, and some prediction files. Available for download as supp.zip [Mirror 1] [Mirror 2].

Reference

If you use this code or data in your work, please cite our publication:

@article{antunes2019a,
  author    = {Antunes, Rui and Matos, S{\'e}rgio},
  journal   = {Database},
  month     = oct,
  number    = {baz095},
  publisher = {{Oxford University Press}},
  title     = {Extraction of chemical--protein interactions from the literature using neural networks and narrow instance representation},
  url       = {https://doi.org/10.1093/database/baz095},
  volume    = {2019},
  year      = {2019},
}