diff --git a/README.md b/README.md index 8e86d43..7cb6ebc 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ In this example, the following plain text files are necessary: Also, there is a `data/ljspeech/phones.txt` file to specify all the phones together with their indexes in dictionary. For LJSpeech, we provide the processed file [online](https://huggingface.co/datasets/cantabile-kwok/ljspeech-1024-256-dur/resolve/main/ljspeech-1024-256.zip). -You can download it and unzip to `data/ljspeech`. +You can download it and unzip to `data/ljspeech/{train,val}`. If you want to train on your own dataset, you might have to create these files yourself (or change the data loading strategy). After having these manifest files, please do the following to extract mel-spectrogram for training: @@ -115,7 +115,7 @@ python inference_dataset.py -c configs/${your_yaml} -m ${model_name} --EMA \ This will synthesize mel-spectrograms for the validation set in your config, storing them at `synthetic_wav/${model_name}/tts_gt_spk/feats.scp`. Speaker, speed and temperature can be specified; see `tools.get_hparams_decode()` function for complete set of options. -> TODO: VOCODER +Inference can then be done in the `hifigan/` directory. Please refer to the [README](hifigan/README.md) there. ## Acknowledgement During the development, the following repositories were referred to: diff --git a/hifigan/README.md b/hifigan/README.md new file mode 100644 index 0000000..0888738 --- /dev/null +++ b/hifigan/README.md @@ -0,0 +1,18 @@ +# HifiGAN (parallel_wavegan implemented version) + +We release the trained checkpoints on LJspeech and LibriTTS here. +The detailed information is: + +| Dataset | Sampling Rate | Hop Size | Window Length | Normed | +|----------|---------------|----------|---------------|--------| +| LJSpeech | 16k | 256 | 1024 | True | +| LibriTTS | 16k | 200 | 800 | True | + +The trained checkpoint on both datasets are provided online. You can unzip them to sub-folders in `exp/`. + +Vocoding can be done by +```shell +cd ../; source path.sh; cd -; # if path.sh not activated +bash generation.sh --dataset "ljspeech/libritts" --eval_dir /path/that/contains/feats.scp +``` +The program will read feats.scp in $eval_dir and synthesize audio to save in that dir. diff --git a/hifigan/cmd.sh b/hifigan/cmd.sh new file mode 100644 index 0000000..19f3421 --- /dev/null +++ b/hifigan/cmd.sh @@ -0,0 +1,91 @@ +# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ====== +# Usage: .pl [options] JOB=1: +# e.g. +# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB +# +# Options: +# --time