See the readme from our ICLR 2021 work for details on setting up the basic training pipeline.
- Download and normalize datasets
make download-datasets
make normalize-datasets
- Create the transformed datasets
make apply-transforms-sri-py150
make apply-transforms-csn-python
make extract-transformed-tokens
- Train a normal seq2seq model for 10 epochs on
sri/py150
bash experiments/normal_seq2seq_train.sh
- Run adversarial training and testing on
sri/py150
for 5 epochs
bash experiments/normal_adv_train.sh
- Get the augmented
csn-python
datasets with random and adversarial views
bash scripts/augment.sh
- Pretrain a seq2seq encoder on a
csn-python
augmented dataset, finetune the encoder onsri/py150
, and test the final model on normal and adversarial datasets.
bash experiments/finetune_and_test_0.sh
- Pretrain a seq2seq encoder on
csn-python
and run adversasrial training starting from the pretrained model.
bash experiments/pretrain_adv_train.sh
Get docker to use an external disk to write intermediate containers.
See https://www.guguweb.com/2019/02/07/how-to-move-docker-data-directory-to-another-location-on-ubuntu/ for details.
Steps involved are
- sudo service docker stop
- Using your preferred text editor add a file named daemon.json under the directory /etc/docker. The file should have this content:
{
"data-root": "/path/to/your/docker"
}
- sudo rsync -aP /var/lib/docker/ /path/to/your/docker
- sudo mv /var/lib/docker /var/lib/docker.old
- sudo service docker start
- sudo rm -rf /var/lib/docker.old