Text to Speech

📚 Quickstart

Data collect

HifiTTS: high-resolution multi-speaker english dataset used here as baseline. Can be downloaded here.

Data preprocessing

Generate phonetic alignment using GlowTTS:

a) Download GlowTTS model checkpoint.

b) Update GLOW_TTS_CKPT_PATH in compute_glowtts_alignments.py script.

c) Prepare a GlowTTS filelist or use this example for HiFiTTS dataset (you need to download the dataset first).

d) Prepare a GlowTTS config, changing:
```
- `"training_files"` to your filelist,
- `"cmudict_path"` to `<nansypp_path>/static/tts/cmu_dictionary`.
```
e) Run the alignment script:
```
python src/data/preprocessing/compute_glowtts_alignments.py <config_file> <input_dir> <output_dir>
```
Decode audio using:

python src/data/preprocessing/decode.py -i <input_dir> -o <output_dir> -sr 44100

Compute TTS targets using:

python -m src.data.preprocessing.precompute_tts_targets \
    <decoded_output_dir>/dataset.csv \
    <sample_rate> \
    <tts_targets_dir> \
    <backbone_exp_dir> \
    <backbone_ckpt_name>

Train/test split:

head -n 1001 <tts_targets_dir>/dataset.csv > <tts_targets_dir>/validation_dataset.csv
head -n 1 <tts_targets_dir>/dataset.csv > <tts_targets_dir>/train_dataset.csv
sed -n '1002,$p' tts_targets_dir>/dataset.csv  >> <tts_targets_dir>/train_dataset.csv

Training

Edit TTS training config: specify <tts_targets_dir> and <alignment_dir>.
Run the training script:

python src/train/tts.py --config-name=hifitts +trainer.devices=<list_of_gpu_ids>

Checkpoint

Run download_backbone_ckpt.py that will download a checkpoint we trained using this repository for 200k training-steps and will place it in the right directory so that following inference and app work smoothly.

python src/utilities/download_checkpoints.py

Inference

An inferencer class is provided in source code and can be called from command-line as follows:

python src/inference/tts.py \
<experiment_directory> \
<checkpoint_filename> \
<audio_path> \
<text> \
<output_path> \
-d <device>

Example:

python src/inference/tts.py \
"static/runs/runs_tts/hifitts/2023-10-03_18-23-00" \
"steps=step=15000.ckpt" \
"static/samples/vctk/p238_001.wav" \
"To be or not to be that is the question" \
"static/tmp/to_be.wav"

Streamlit app

streamlit run app/text_to_speech.py --server.port <port_number>

Logs

Along training you can visualize logs using the following command:

tensorboard --logdir=static/runs/runs_tts --bind_all --port <port_number>

🔬 R&D

Observations and key R&D results are detailed here.

🎧 Results

Results from checkpoints trained with this repo are showcased on this Notion page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS_DOCUMENTATION.md

TTS_DOCUMENTATION.md

Text to Speech

📚 Quickstart

Data collect

Data preprocessing

Training

Checkpoint

Inference

Streamlit app

Logs

🔬 R&D

🎧 Results

Files

TTS_DOCUMENTATION.md

Latest commit

History

TTS_DOCUMENTATION.md

File metadata and controls

Text to Speech

📚 Quickstart

Data collect

Data preprocessing

Training

Checkpoint

Inference

Streamlit app

Logs

🔬 R&D

🎧 Results