DialogCorpus

A large scale dialog corpus for training the Next-Gen Dialog System.

How to Use?

First download the repository.

# download
git clone https://github.com/qywu/DialogCorpus.git
cd DialogCorpus

You can manually download and process the dataset.

# download data for daily_dialog
python daily_dialog/download_data.py
# process the data
python daily_dialog/process_data.py
# the processed data is stored as the {folder_name}.json
vi daily_dialog/data/daily_dialog.json

Or you can just use one command.

python prepare_all_data.py \
       --download \
       --process \
       --join

Detailed Dialog Processing for each dataset:

Daily Dialog
- Removed tokenization space for punctuations
Persona Chat
- Used huggingface's version [link]
- Recovered lower cased utterances
- Removed tokenization space for punctuations
Cornell Movie Corpus
- Ignored UTF-8 Errors
- Extracted Names
Task Master
- Nothing specific
CCPE
- Nothing specific
Frames
- Nothing specific
Chit-Chat Challenge
- Nothing specific
Self-dialogue
- Nothing specific
Schema Dialog
- Nothing specific

Links

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
CCC		CCC
CCPE		CCPE
conversations_gone_awry_cmv_corpus		conversations_gone_awry_cmv_corpus
conversations_gone_awry_corpus		conversations_gone_awry_corpus
cornell_movie		cornell_movie
daily_dialog		daily_dialog
frames		frames
friends_corpus		friends_corpus
persona_chat		persona_chat
schema_dialog		schema_dialog
self_dialog		self_dialog
subreddit_corpus		subreddit_corpus
taskmaster		taskmaster
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
prepare_all_data.py		prepare_all_data.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DialogCorpus

How to Use?

Detailed Dialog Processing for each dataset:

About

Releases

Packages

Languages

License

qywu/DialogCorpus

Folders and files

Latest commit

History

Repository files navigation

DialogCorpus

How to Use?

Detailed Dialog Processing for each dataset:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages