This repo contains code and data for reproducing the LSTM results in Self-Training for Compositional Neural NLG in Task-Oriented Dialogue. This repo was originally used for pull request #5 of facebookresearch/TreeNLG. Then, it reborn as a branch based on commit e66e012 of facebookresearch/TreeNLG for further reasearch. The BART version is at znculee/TreeNLG-BART.
TBD
In addition to the weather and enriched E2E challenge dataset from our paper, we released another weather_challenge dataset, which contains harder weather scenarios in train/val/test files. Each response was collected by providing annotators, who are native English speakers, with a user query, and a compositional meaning representation (with discourse relations and dialog acts). All of these are made available in our dataset. See our linked paper for more details.
Dataset | Train | Val | Test | Disc_Test |
---|---|---|---|---|
Weather | 25390 | 3078 | 3121 | 454 |
Weather_Challenge | 32684 | 3397 | 3382 | - |
E2E | 42061 | 4672 | 4693 | 230 |
Disc_Test
is a more challenging subset of our test set that contains discourse relations, which is also the subset we report results in Disc
column in Table 7 in our paper.
Note that there are some minor differences of data statistics to our paper, please use the statistics above.
Note: There are some responses in Weather
dataset which are not provided a user query (141/17/18/4 for train/val/test/disc_test, respectively).
We simply use a "placeholder" token for those missing user queries.
fairseq should be installed at the very beginning, referring to Requirements and Installation of Fairseq. The code has been tested on commit e9014fb
of fairseq.
conda create -n treenlg python=3.7 pip
conda activate treenlg
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
git clone https://github.com/znculee/TreeNLG.git
cd TreeNLG
git clone https://github.com/pytorch/fairseq.git
cd fairseq
git checkout -b treenlg e9014fb
pip install -e .
cd ..
bash scripts/prepare.weather.sh
bash scripts/train.weather.lstm.sh
bash scripts/generate.weather.lstm.sh
The BLEU score is calculated on just the output text, without any of the tree information. "+replfail" indicates evaluating the constrained decoding generations by replacing the failure cases with unconstrained decoding generations. We use the BLEU evaluation script provided for the E2E challenge here.
Dataset | Method | discourse | | no-discourse | | whole
| | BLEU | TreeAcc | BLEU | TreeAcc | TreeAcc
-- | -- | -- | -- | -- | -- | --
Weather | S2S-Tree | 74.51 | 89.65 | 76.34 | 94.17 | 93.59
| +constr | 75.41 | 100.0 | 76.88 | 99.84 | 99.86
| +replfail | 75.41 | 100.0 | 77.38 | 99.84 | 99.86
-- | -- | -- | -- | -- | -- | --
Weather | S2S-Tree | N/A | N/A | 77.79 | 94.09 | N/A
Challenge | +constr | N/A | N/A | 78.52 | 99.91 | N/A
| +replfail | N/A | N/A | 79.02 | 99.91 | N/A
-- | -- | -- | -- | -- | -- | --
E2E | S2S-Tree | 66.70 | 62.17 | 77.37 | 96.72 | 95.10
| +constr | 64.32 | 99.13 | 77.44 | 99.89 | 99.86
| +replfail | 65.38 | 99.13 | 77.43 | 99.89 | 99.86
Please refer to self_training/README.md to reproduce the results of self-training experiments in the paper.