Exploring Noise Injection for Text Classification

Building upon our comprehensive reimplementation of AEDA: An Easier Data Augmentation Technique for Text Classification, we extended our inquiry into advanced data augmentation techniques for text classification in natural language processing (NLP). We explored the integration of alphabet and numerical noise, in addition to AEDA’s foundational technique of punctuation mark insertion.

To view information on AEDA Reimplementation, click here
Link to our Poster and Report

Repository Structure

├── aeda
├── code
├── data
├── experiments
│   ├── addratio_experiment
│   ├── bert
│   ├── increments_experiment
│   └── numaug_experiment
└── reproduce_fig2

aeda and data are from the original AEDA repo
code includes augmentation code you can apply on your own data
experiments include code we used to run different experiments for our project. Refer to each folder's README for hyperparameter settings used
reproduce_fig2 is for our AEDA reimplementation task

Results from our experiments

You can find individual plots with better resolution in outputs/[runname]/plots folder for each experiment

Add Ratio Experiment
Increments Experiment
Number of Augmentations Experiment

To run experiments

Set up requirements

pip install -r requirements.txt

Download glove.840B.300d to word2vec/ folder

wget https://nlp.stanford.edu/data/glove.840B.300d.zip && unzip glove.840B.300d.zip
mkdir word2vec 
mv glove.840B.300d.txt word2vec/ && rm glove.840B.300d.zip

cd to the experiment folder you want to run.

cd experiments/[experiment_folder]

Process data for training; this produces appropriate augmented data for the experiment, on top of the original training data. Refer to data_process.py to check which augmentations will be created.

python data_process.py

Run the experiments.

python train_eval.py --seed 0 --runname myrun

train_eval.py takes three arguments, seed, runname and analyze. If you don't specify runname, it will automatically save experiment results under a folder name with current time.

(Optional) Run below command to create a figure based on the experiments result, specifying runname in outputs/ that you want to create plots based on.

python plot_individual.py myrun

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
aeda		aeda
code		code
data		data
experiments		experiments
reproduce_fig2		reproduce_fig2
.gitignore		.gitignore
README.md		README.md
Reimplementing-AEDA.md		Reimplementing-AEDA.md
aeda_figure2.png		aeda_figure2.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Noise Injection for Text Classification

Repository Structure

Results from our experiments

To run experiments

About

Releases

Packages

Languages

yoonichoi/NoiseInjection-TextClassification

Folders and files

Latest commit

History

Repository files navigation

Exploring Noise Injection for Text Classification

Repository Structure

Results from our experiments

To run experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages