- Sequence to Sequence Learning with Neural Networks, https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf, 2014.
- Neural Machine Translation by Jointly Learning to Align and Translate, paper, 2015 (Additive Attention).
- Effective Approaches to Attention-based Neural Machine Translation, paper, 2015 (Multiplicative Attention).
- A Structured Self-Attentive Sentence Embedding, paper, ICLR 2017 (Self-attention).
- Long Short-Term Memory-Networks for Machine Reading, paper, EMNLP 2016 (Self-attention).
- A Decomposable Attention Model for Natural Language Inference, paper, EMNLP 2016 (Self-attention).
- A Deep Reinforced Model for Abstractive Summarization, paper, 2017 (Self-attention).
- Frustratingly Short Attention Spans in Neural Language Modeling, paper, ICLR 2017 (Key-value attention).
- Attention is all you need, https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf, NIPS 2017.
- Efficient Estimation of Word Representations in Vector Space, https://arxiv.org/pdf/1301.3781.pdf, 2013.
- Distributed Representations of Words and Phrases and their Compositionality, http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf, 2013.
- Distributed Representations of Sentences and Documents, https://arxiv.org/pdf/1405.4053.pdf, 2014.
- GloVe: Global Vectors for Word Representation, https://www.aclweb.org/anthology/D14-1162, 2014.
- Semi-supervised Sequence Learning, https://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf, 2015.
- Deep contextualized word representations, https://aclweb.org/anthology/N18-1202, 2018.
- Universal Language Model Fine-tuning for Text Classification, https://aclweb.org/anthology/P18-1031, 2018.
- Improving Language Understanding by Generative Pre-Training, https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, https://arxiv.org/pdf/1810.04805.pdf, 2018.
- Language Models are Unsupervised Multitask Learners, https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf, 2019.
-
Bidirectional LSTM-CRF Models for Sequence Tagging, https://arxiv.org/pdf/1508.01991.pdf, 9 Aug 2015.
-
Named Entity Recognition with Bidirectional LSTM-CNNs, https://www.aclweb.org/anthology/Q16-1026, 2016.
-
Neural Architectures for Named Entity Recognition, https://arxiv.org/pdf/1603.01360.pdf, 2016.
-
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, https://arxiv.org/pdf/1603.01354.pdf, 2016.
-
A Convolutional Neural Network for Modelling Sentences, https://arxiv.org/pdf/1404.2188.pdf, ACL 2014.
-
Convolutional Neural Networks for Sentence Classification, https://arxiv.org/pdf/1408.5882.pdf, EMNLP 2014.
-
Character-level Convolutional Networks for Text Classification, https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf, NIPS 2015.
-
Very Deep Convolutional Networks for Text Classification, https://www.aclweb.org/anthology/E17-1104, EACL 2017.
-
Deep Pyramid Convolutional Neural Networks for Text Categorization, https://aclweb.org/anthology/P17-1052, ACL 2017.
-
A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification, https://www.aclweb.org/anthology/I17-1026, IJCNLP 2017.