Skip to content

Latest commit

 

History

History
 
 

longer_names_ner

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Longer Names for testing NER 🦎 + ⌨️ → 🐍

This is an example perturbation used to demonstrate how to add noise to a named-entity tagging problem.

What type of a transformation is this?

This transformation acts like a perturbation to test robustness. (Neil Armstrong was the first to walk on the moon., B-PER, I-PER, O, O, O, O, O, O, O, O) --> (Neil D. M. Armstrong was the first to walk on the moon., B-PER, I-PER, I-PER, I-PER, O, O, O, O, O, O, O, O)

What tasks does it intend to benefit?

This perturbation would benefit all tasks which have a sentence/paragraph/document as input like text classification, text generation, and most importantly a tagging task. This would help augment data for an NER task by keeping the labels still aligned.

What are the limitations of this transformation?

The transformation's outputs are too simple to be used for data augmentation and has been used for demonstration. Unlike a paraphraser, it is not capable of generating linguistically diverse text.

The accuracy of a BERT-base model (fine-tuned on conll2003) (model: "dslim/bert-base-NER") on a subset of conll2003 (20%) validation dataset = 81.364% The accuracy of the same model on the perturbed set = 70.911%