additional notes on training

Plachtaa · Nov 29, 2024 · 1399efc · 1399efc
1 parent a516247
commit 1399efc
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 2 deletions.
diff --git a/README-ZH.md b/README-ZH.md
@@ -103,6 +103,7 @@ python real-time-gui.py --checkpoint <path-to-checkpoint> --config <path-to-conf
 这里是一个简单的Colab示例以供参考: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1R1BJTqMsTXZzYAVx3j1BiemFXog9pbQG?usp=sharing)
 1. 准备您的数据集。必须满足以下要求：
     - 文件结构不重要
+    - 每条音频长度必须在1-30秒之间，否则会被自动忽略
     - 所有音频文件必须是以下格式之一：`.wav` `.flac` `.mp3` `.m4a` `.opus` `.ogg`
     - 不需要说话人标签，但请确保每位说话人至少有 1 条语音
     - 当然，数据越多，模型的表现就越好
@@ -134,7 +135,9 @@ where:
 - `save-every` 保存模型检查点的步数
 - `num-workers` 数据加载的工作线程数量，建议 Windows 上设置为 0
 
-4. 训练完成后，您可以通过指定检查点和配置文件的路径来进行推理。
+4. 如果需要从上次停止的地方继续训练，只需运行同样的命令即可。通过传入相同的 `run-name` 和 `config` 参数，程序将能够找到上次训练的检查点和日志。
+
+5. 训练完成后，您可以通过指定检查点和配置文件的路径来进行推理。
     - 它们应位于 `./runs/<run-name>/` 下，检查点命名为 `ft_model.pth`，配置文件名称与训练配置文件相同。
     - 在推理时，您仍需指定要使用的说话人的参考音频文件，类似于零样本推理。
 

diff --git a/README.md b/README.md
@@ -112,6 +112,7 @@ Fine-tuning on custom data allow the model to clone someone's voice more accurat
 A Colab Tutorial is here for you to follow: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1R1BJTqMsTXZzYAVx3j1BiemFXog9pbQG?usp=sharing)
 1. Prepare your own dataset. It has to satisfy the following:
     - File structure does not matter
+    - Each audio file should range from 1 to 30 seconds, otherwise will be ignored
     - All audio files should be in on of the following formats: `.wav` `.flac` `.mp3` `.m4a` `.opus` `.ogg`
     - Speaker label is not required, but make sure that each speaker has at least 1 utterance
     - Of course, the more data you have, the better the model will perform
@@ -143,7 +144,9 @@ where:
 - `save-every` is the number of steps to save the model checkpoint
 - `num-workers` is the number of workers for data loading, set to 0 for Windows    
 
-4. After training, you can use the trained model for inference by specifying the path to the checkpoint and config file.
+4. If training accidentially stops, you can resume training by running the same command again, the training will continue from the last checkpoint. (Make sure `run-name` and `config` arguments are the same so that latest checkpoint can be found)
+
+5. After training, you can use the trained model for inference by specifying the path to the checkpoint and config file.
     - They should be under `./runs/<run-name>/`, with the checkpoint named `ft_model.pth` and config file with the same name as the training config file.
     - You still have to specify a reference audio file of the speaker you'd like to use during inference, similar to zero-shot usage.