Relation extraction through distal supervision with CNNs. Reproducing Zeng et al's work here: http://www.emnlp2015.org/proceedings/EMNLP/pdf/EMNLP203.pdf
Download the NYT10 relation extraction dataset from: https://github.com/thunlp/NRE/blob/master/data.zip and extract to the top-most level of the repository. Also download the Wikipedia 2014 + Gigaword 5 distribution of the pretrained GloVe vectors from here: http://nlp.stanford.edu/data/glove.6B.zip. Structure should be:
└── data/
├── entity2id.txt
├── relation2id.txt
├── test.txt
├── train.txt
└── glove.6B.50d.txt
The dataset is in the following format:
- train.txt: training file, format (fb_mid_e1, fb_mid_e2, e1_name, e2_name, relation, sentence).
- test.txt: test file, same format as train.txt.
- entity2id.txt: all entities and corresponding ids, one per line.
- relation2id.txt: all relations and corresponding ids, one per line.
- glove.6B.50d.txt: the pre-trained GloVe word embedding file hosted by Stanford NLP