nlp_paper_notes/data - noise.md at master · Mileworks/nlp_paper_notes · GitHub

Finding label errors in datasets and learning with noisy labels. https://pypi.org/project/cleanlab/

https://github.com/cgnorthcutt/cleanlab

Data Noising as Smoothing

Data Noising as Smoothing in Neural Network Language Models. ICLR2017

NCE loss

noise contrastive estimation

https://datascience.stackexchange.com/questions/13216/intuitive-explanation-of-noise-contrastive-estimation-nce-loss

Noise-contrastive estimation: A new estimation principle for unnormalized statistical models, paper
Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks
1. Random Sampling. We randomly select a number of negative samples for each positive answer.
2. Max Sampling. We select the most difficult negative samples. In each epoch, we compute the similarities between all (p+, p−) pairs using the trained model from the previous training epoch. Then we select the negative answers by maximizing their similarities to the positive answer.
3. Mix Sampling. We take advantages of both random sampling and max sampling by selecting half of the samples from each strategy.

Noise-robust loss

$\mathcal{L}_{DMI}$: An Information-theoretic Noise-robust Loss Function, NeurIPS 2019 arxiv code