Building upon our comprehensive reimplementation of AEDA: An Easier Data Augmentation Technique for Text Classification, we extended our inquiry into advanced data augmentation techniques for text classification in natural language processing (NLP). We explored the integration of alphabet and numerical noise, in addition to AEDA’s foundational technique of punctuation mark insertion.
To view information on AEDA Reimplementation, click here
Link to our Poster and Report
├── aeda
├── code
├── data
├── experiments
│ ├── addratio_experiment
│ ├── bert
│ ├── increments_experiment
│ └── numaug_experiment
└── reproduce_fig2
aeda
anddata
are from the original AEDA repocode
includes augmentation code you can apply on your own dataexperiments
include code we used to run different experiments for our project. Refer to each folder'sREADME
for hyperparameter settings usedreproduce_fig2
is for our AEDA reimplementation task
You can find individual plots with better resolution in outputs/[runname]/plots
folder for each experiment
- Set up requirements
pip install -r requirements.txt
- Download
glove.840B.300d
toword2vec/
folder
wget https://nlp.stanford.edu/data/glove.840B.300d.zip && unzip glove.840B.300d.zip
mkdir word2vec
mv glove.840B.300d.txt word2vec/ && rm glove.840B.300d.zip
cd
to the experiment folder you want to run.
cd experiments/[experiment_folder]
- Process data for training; this produces appropriate augmented data for the experiment, on top of the original training data. Refer to
data_process.py
to check which augmentations will be created.
python data_process.py
- Run the experiments.
python train_eval.py --seed 0 --runname myrun
train_eval.py
takes three arguments, seed
, runname
and analyze
. If you don't specify runname
, it will automatically save experiment results under a folder name with current time.
- (Optional) Run below command to create a figure based on the experiments result, specifying
runname
inoutputs/
that you want to create plots based on.
python plot_individual.py myrun