NLU
· NLG
· NLI
· POS
· NER
· MT
· QA
NLP 컨트리뷰션, 코드, 데이터 모음
- [2023-05] LIMA: Less Is More for Alignment paper
- [2023-05] RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs paper
- [2023-05] Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision paper
- [2023-05] Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback paper
- [2023-04] Fundamental Limitations of Alignment in Large Language Models paper
- [2020/01] Reformer: The Efficient Transformer : Reformer
-
- This paper used replace dot-product attention by one that uses locality-sensitive hashing.
-
- Researchers also used reversible residual layers instead of the standard residuals.
-
- [2020/01] PoWER-BERT: Accelerating BERT inference for Classification Tasks : PoWER-BERT
- [2020/01] Towards a Human-like Open-Domain Chatbot + Google AI Blog : Meena
- [2020/03] ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators : ELECTRA
- [2020/03] Data Augmentation using Pre-trained Transformer Models
- [2020/04] Longformer: The Long-Document Transformer : Longformer
- [2020/04] You Impress Me: Dialogue Generation via Mutual Persona Perception
- [2020/04] Recipes for building an open-domain chatbot
- [2020/04] ToD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogues : ToD-BERT
- [2020/04] SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model : SOLOIST
- [2020/04] FastBERT: a Self-distilling BERT with Adaptive Inference Time : FastBERT
- [2020/04] FLAT: Chinese NER Using Flat-Lattice Transformer : FLAT
- [2020/05] A Simple Language Model for Task-Oriented Dialogue [Dataset]
- 태스크 중심 대화(Task-oriented dialogue)와 관련한 연구 논문
- 태스크 중심 대화는 사용자 인텐트를 탐지하고, 대화 정책에 따른 system action을 결정하고 응답을 생성하는데 초점이 있음.
- 이 연구에서 제안한 SimpleTOD 접근 방법은 dialogue state tracking의 state-of-the-art 성능을 보임.
- End-to-End 태스크 중심대화는 end-to-end 솔루션의 방향으로 전환중.
- 태스크 중심 대화(Task-oriented dialogue)와 관련한 연구 논문
- [2020/05] Language Models are Few-Shot Learners : GPT-3 [Dataset]
- 메타 러닝(in-context learning : repeated sub-tasks embedded within a single sequence
- Our basic pre-training approach, including model, data, and training, is similar to the process described in RWC+19
- we systematically explore different settings for learning within the context.
- we train GPT-3, an autoregressive language model with 175 billion parameters(10x any previous non-sparse model).
- test in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning
- [2019/02] Language Models are Unsupervised Multitask Learners : GPT-2
- [2019/04] Language Models with Transformers
- [2019/08] Neural Text Generation with Unlikelihood Training
- [2019/01] Cross-lingual Language Model Pretraining XLM
- [2019/01] Multi-Task Deep Neural Networks for Natural Language Understanding : MT-DNN
- [2019/01] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context : Transformer-XL
- [2019/06] XLNet: Generalized Autoregressive Pretraining for Language Understanding : XLNet
- [2019/04] The Curious Case of Neural Text Degeneration
- [2019/04] Mask-Predict: Parallel Decoding of Conditional Masked Language Models : Mask-Predict
- [2019/09] Fine-Tuning Language Models from Human Preferences
- [2019/01] BioBERT: a pre-trained biomedical language representation model for biomedical text mining : BioBERT
- [2019/03] SciBERT: A Pretrained Language Model for Scientific Text : SciBERT
- [2019/04] ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission : ClinicalBERT
- [2019/06] HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization : HIBERT
- [2019/07] SpanBERT: Improving Pre-training by Representing and Predicting Spans : SpanBERT
- [2019/04] Publicly Available Clinical BERT Embeddings
- [2019/08] Pre-Training with Whole Word Masking for Chinese BERT
- [2019/07] Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
- [2019/07] R-Transformer: Recurrent Neural Network Enhanced Transformer : R-Transformer
- [2019/07] ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation : ReCoSa
- [2019/07] RoBERTa: A Robustly Optimized BERT Pretraining Approach : RoBERTa
- [2019/09] FREELB: ENHANCED ADVERSARIAL TRAINING FOR LANGUAGE UNDERSTANDING : FREELB
- [2019/09] Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks
- [2019/10] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer : T5
- [2019/08] Zero-shot Word Sense Disambiguation using Sense Definition Embeddings
- [2019/06] Bridging the Gap between Training and Inference for Neural Machine Translation
- [2019/06] Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts
- [2019/07] A Simple Theoretical Model of Importance for Summarization
- [2019/05] Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
- [2019/07] We need to talk about standard splits
- [2019/07] ERNIE 2.0: A Continual Pre-training Framework for Language Understanding : ERNIE 2.0
- [2019/07] Multi-Task Deep Neural Networks for Natural Language Understanding : mt-dnn
- [2019/05] SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems : SuperGLUE
- [2019/10] DistillBERT, a distilled version of BERT: smaller, faster, cheaper and lighter : DistillBERT
- [2019/10] TinyBERT: Distilling BERT for Natural Language Understanding : TinyBERT
- [2019/11] Not Enough Data? Deep Learning to the Rescue!
- [2019/11] DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation : DialoGPT
- [2018/12] Conditional BERT Contextual Augmentation
- [2018/01]Universal Language Model Fine-tuning for Text Classification : ULMFIT
- [2018/02] Deep contextualized word representations : ELMo
- [2018/06] Improving Language Understanding by Generative Pre-Training : GPT-1
- [2018/07] Subword-level Word Vector Representations for Korean
- [2018/10] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding : BERT
- [2017/07] Attention Is All You Need : Transformer
- [2017/08] Learned in Translation: Contextualized Word Vectors : CoVe
- [2016/06] Siamese CBOW: Optimizing Word Embeddings for Sentence Representations : Siamese CBOW
- [2016/07] Enriching Word Vectors with Subword Information : fastText
- [2014/01] Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation : seq2Seq + GRU
- [2014/07] GloVe: Global Vectors for Word Representation : GloVe
- [2014/09] Sequence to Sequence Learningwith Neural Networks : seq2seq
- [2014/12] Dependency-Based Word Embeddings
- [97/12] Long Short-term Memory : LSTM