⭐ - ablation analysis

Segmentation, Tagging, Parsing, SRL

Michael Collins: Head-Driven Statistical Models for Natural Language Parsing, PhD Dissertation, University of Pennsylvania, 1999.
Michael Collins: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, EMNLP 2002. (Received Best Paper Award)
Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, Eliyahu Kiperwasser, Yoav Goldberg, TACL 2016 arxiv
Deep Semantic Role Labeling: What Works and What’s Next, He Luheng, Lee Kenton, Lewis Mike, Zettlemoyer Luke, ACL 2017 paper ⭐

Machine Translation & Transliteration, Sequence-to-Sequence Models

Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017. arxiv

Get To The Point: Summarization with Pointer-Generator Networks, Abigail See, Peter J. Liu, Christopher D. Manning, ACL 2017 arxiv | code | pytorch code

Matthew E. Peters, et al.: Deep contextualized word representations, 2018. arxiv
Jacob Devlin, et al.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018. arxiv

Natural Language Processing (almost) from Scratch, Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa, JMLR 2011, arxiv ⭐⭐⭐⭐⭐

Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le, NIPS 2014 arxiv | seq2seq ⭐⭐⭐⭐⭐