Skip to content

Commit

Permalink
additional notes on training
Browse files Browse the repository at this point in the history
  • Loading branch information
Plachtaa committed Nov 29, 2024
1 parent a516247 commit 1399efc
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 2 deletions.
5 changes: 4 additions & 1 deletion README-ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ python real-time-gui.py --checkpoint <path-to-checkpoint> --config <path-to-conf
这里是一个简单的Colab示例以供参考: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1R1BJTqMsTXZzYAVx3j1BiemFXog9pbQG?usp=sharing)
1. 准备您的数据集。必须满足以下要求:
- 文件结构不重要
- 每条音频长度必须在1-30秒之间,否则会被自动忽略
- 所有音频文件必须是以下格式之一:`.wav` `.flac` `.mp3` `.m4a` `.opus` `.ogg`
- 不需要说话人标签,但请确保每位说话人至少有 1 条语音
- 当然,数据越多,模型的表现就越好
Expand Down Expand Up @@ -134,7 +135,9 @@ where:
- `save-every` 保存模型检查点的步数
- `num-workers` 数据加载的工作线程数量,建议 Windows 上设置为 0

4. 训练完成后,您可以通过指定检查点和配置文件的路径来进行推理。
4. 如果需要从上次停止的地方继续训练,只需运行同样的命令即可。通过传入相同的 `run-name``config` 参数,程序将能够找到上次训练的检查点和日志。

5. 训练完成后,您可以通过指定检查点和配置文件的路径来进行推理。
- 它们应位于 `./runs/<run-name>/` 下,检查点命名为 `ft_model.pth`,配置文件名称与训练配置文件相同。
- 在推理时,您仍需指定要使用的说话人的参考音频文件,类似于零样本推理。

Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ Fine-tuning on custom data allow the model to clone someone's voice more accurat
A Colab Tutorial is here for you to follow: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1R1BJTqMsTXZzYAVx3j1BiemFXog9pbQG?usp=sharing)
1. Prepare your own dataset. It has to satisfy the following:
- File structure does not matter
- Each audio file should range from 1 to 30 seconds, otherwise will be ignored
- All audio files should be in on of the following formats: `.wav` `.flac` `.mp3` `.m4a` `.opus` `.ogg`
- Speaker label is not required, but make sure that each speaker has at least 1 utterance
- Of course, the more data you have, the better the model will perform
Expand Down Expand Up @@ -143,7 +144,9 @@ where:
- `save-every` is the number of steps to save the model checkpoint
- `num-workers` is the number of workers for data loading, set to 0 for Windows

4. After training, you can use the trained model for inference by specifying the path to the checkpoint and config file.
4. If training accidentially stops, you can resume training by running the same command again, the training will continue from the last checkpoint. (Make sure `run-name` and `config` arguments are the same so that latest checkpoint can be found)

5. After training, you can use the trained model for inference by specifying the path to the checkpoint and config file.
- They should be under `./runs/<run-name>/`, with the checkpoint named `ft_model.pth` and config file with the same name as the training config file.
- You still have to specify a reference audio file of the speaker you'd like to use during inference, similar to zero-shot usage.

Expand Down

0 comments on commit 1399efc

Please sign in to comment.