diff --git a/.gitignore b/.gitignore index 3eb56cb..029eac4 100644 --- a/.gitignore +++ b/.gitignore @@ -150,3 +150,6 @@ cython_debug/ # and can be added to the global gitignore or merged into this file. For a more nuclear # option (not recommended) you can uncomment the following to ignore the entire idea folder. .idea/ +synthetic_wav/ +exp/ +**/*.wav \ No newline at end of file diff --git a/README.md b/README.md index 3530de2..e59f2e7 100644 --- a/README.md +++ b/README.md @@ -21,11 +21,60 @@ python setup.py build_ext --inplace Note that to avoid the trouble of installing [torchdyn](https://github.com/DiffEqML/torchdyn), we directly copy the torchdyn 1.0.6 version here locally at `torchdyn/`. The following process may also need `bash` and `perl` commands in your environment. + ## Data Preparation +This repo relies on Kaldi-style data organization. +All data description files should be put in subdirectories in `data/`. +See `data/ljspeech/example` for a basic example. +In this example, the following plain text files are necessary: +1. `wav.scp`: organized as `utt /path/to/wav`. +2. `utts.list`: every line specifies an utterance. This can be obtained by `cut -d ' ' -f 1 wav.scp > utts.list`. +3. `utt2spk`: organized as `utt spk_name`. +4. `text` and `phn_duration`: specifies the phoneme sequence and the corresponding integer durations (in frames). +Also, there is a `data/ljspeech/phones.txt` file to specify all the phones together with their indexes in dictionary. + +For LJSpeech, we provide the processed file [online](https://huggingface.co/datasets/cantabile-kwok/ljspeech-1024-256-dur/resolve/main/ljspeech-1024-256.zip). +You can download it and unzip to `data/ljspeech`. +If you want to train on your own dataset, you might have to create these files yourself (or change the data loading strategy). + +After having these manifest files, please do the following to extract mel-spectrogram for training: +```shell +bash extract_fbank.sh --stage 0 --stop_stage 2 --nj 16 +# nj: number of parallel jobs. +# Have a look into the script if you need to change something +# Bash variables before "parse_options.sh" can be passed by CLI, e.g. "--key value". +``` +Note that we default to use **16kHz** data here. +This will create `feats/fbank` and `feats/normed_fbank`, where Kaldi-style scp and ark files store the mel-spectrogram data. +The normed features will be used for training. + +If you want to use speaker-IDs (like LJSpeech, instead of using pretrained speaker embeddings such as xvectors) for training, please run: +```shell +make_utt2spk_id.py data/ljspeech/train/utt2spk data/ljspeech/val/utt2spk +# You can add more files in CLI. Will write utt2num_frames in the same directory to these files. +``` ## Training +Configurations for training is stored as yaml file in `configs/`. +Data manifests and features for training and validation set will be specified in those yaml files. +You will need to change double-quoted file paths there if you need to train on your own data. + +Then, training is performed by +```shell +python train.py -c configs/${your_yaml} -m ${model_name} +# e.g. python train.py -c configs/lj_16k_gt_dur.yaml -m lj_16k_gt_dur +``` +It will create `logs/${model_name}` for logging and checkpointing. + +Several notes: +* By default, the program performs EMA to average weights. Weights with or without EMA will both be saved. +* By default, the program will try to find the latest checkpoint for resuming. EMA checkpoints are prior to non-EMA checkpoints. +* You can set `use_gt_dur` to `false` to turn on MAS algorithm. In this setting, it is better to set `add_blank` to `true`. +## Generate Data for ReFlow and Perform Reflow +TO BE DONE ## Inference +TO BE DONE ## Acknowledgement During the development, the following repositories were referred to: diff --git a/cmd.sh b/cmd.sh new file mode 100644 index 0000000..19f3421 --- /dev/null +++ b/cmd.sh @@ -0,0 +1,91 @@ +# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ====== +# Usage: .pl [options] JOB=1: +# e.g. +# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB +# +# Options: +# --time