Note: this repository was created solely for the purpose of organizing and submitting our finalized project. The commits made to this repository do not accurately represent the individual contributions from each member.
- 20160811 Jeongeon Park
- 20160830 Suro Lee
- 20170798 Seungho Kim
- 20170828 Chanhee Lee
In this project, we propose a search-based approach that automatically generates chatbot test input of high quality. Our approach uses the Metropolis-Hastings algorithm, where we improve on an existing paper by N. Miao et al. (2019) to generate input data in the form of questions. Through our comparison with human-generated test input in terms of both the generated test input and the chatbot output when putting in the generated test input, we show that the model-generated test input using our approach is more diverse and relevant to the topic keyword than the human-generated test input.
-
python 3.8
-
Training and generation
- TensorFlow
== 2.3.1
(other versions are not tested) - numpy
- pickle
- TensorFlow
-
Evaluation
- spaCy
- after installing, run
python -m spacy download en_core_web_lg
to download the required model
- after installing, run
- gensim
- pandas
- nltk
- spaCy
To use a pre-trained language model, download the forward
and backward
folders into model
.
$ python model/train.py [-h] [--backward] [-e EPOCH] [-b BATCH]
-h, --help
shows the help message and exits
--backward
include this argument to train the backward model (instead of the forward model)
-e EPOCH, --epoch EPOCH
sets the maximum number of epochs to run (type: int, default: 100)
-b BATCH, --batch BATCH
sets the batch size (type: int, default: 32)
Optional: insert your own keywords (from which the questions are generated) into data/input/keywords.txt
.
$ python model/questions_gen.py
Generated questions are written into data/output/output.txt
.
The file evaluate/diversity.py
is used to evaluate both [1] the generated questions and [2] the chatbot's responses.
-
Generate the questions file
data/output/output.txt
. -
Use this file's relative path as the
file
argument.
For evaluation, we used Pandorabots' Kuki as our test chatbot.
-
Enter each question into the chat, and download the conversation as a
.json
file. -
Parse the conversation using
evaluate/parseMessages.py
(for usage, add the--help
argument for details.) -
Use the parsed file's relative path as the
file
argument.
$ python evaluate/diversity.py [-h] [--output] [-a A] [-b B] file
file
relative path of the
.txt
file to be used for evaluation.
-h, --help
shows the help message and exits
--output
add this argument to evaluate generated questions (instead of chatbot conversation)
-a A
Only for chatbot responses
index of the first message to evaluate (type: int, default: 0)
-b B
Only for chatbot responses
index of the last message to evaluate (type: int, default: last index)
The file evaluate/topic_relevance.py
is used to evaluate the generated questions.
-
Generate the questions file
data/output/output.txt
. -
Divide the sentences generated in
output.txt
by the keyword used, and place them in the arraysinput_text1
,input_text2
, andinput_text3
.- Only three topics can be covered during one evaluation run.
$ python evaluate/topic_relevance.py [-h] keyword
keyword
questions generated with this keyword will be chosen from output.txt to evaluate relevance
Train forward/backward language model
$ python model/train.py
$ python model/train.py --backward
Generate questions
$ python model/questions_gen.py
Evaluate diversity and topic relevance (example with sports) of generated questions
$ python evaluate/diversity.py --output ../data/output/output.txt
$ python evaluate/topic_relevance.py sports
Parse and evaluate diversity of chatbot conversation
$ python evaluate/parseMessages.py message_1.json data.txt
$ python evaluate/diversity.py data.txt