关于 prompt 梅尔谱的标准化 #8

hopingZ · 2023-11-03T09:40:41Z

感谢大佬的开源！想请问可以分享一下 cmvn.ark 这个文件吗 🙏🏻🙏🏻🙏🏻
目前直接用没标准化的梅尔谱当 prompt，发音都很清晰，就是音色不太像，想看看标准化后的效果 🙏🏻🙏🏻🙏🏻
另外想确认下关于梅尔谱的参数：

prompt_wav, sr = librosa.load(prompt_src_wav_file, sr=16000)
prompt = logmelspectrogram(
    x=prompt_wav.T,
    fs=16000,
    n_mels=80,
    n_fft=1024,
    n_shift=160,
    win_length=465,
    window="hann",
    fmin=80,
    fmax=7600).squeeze()[None, :, :]
prompt = torch.FloatTensor(prompt)

是不是这样加载进来再标准化一下，就可以跟模型适配了

The text was updated successfully, but these errors were encountered:

cantabile-kwok · 2023-11-04T09:18:38Z

您好，CMVN文件上传到了这个链接，可以试试～

关于梅尔谱的参数，看上去是正确的，只要保持跟utils/compute-fbank-feats.py的流程一致就可以了。

（不敢认大佬的称号，只是做了一点代码上的工作哈哈）

hopingZ · 2023-11-04T09:37:36Z

太太太感谢了！！❤️

segmentationFaults · 2023-11-21T02:15:23Z

做了归一化感觉还是不太像，和论文demo还是有些差距，因为论文的模型训练数据更多吗？

cantabile-kwok · 2023-11-21T04:51:02Z

@segmentationFaults 论文中的模型就是使用了LibriTTS train clean+other的数据，请问您具体测试用的是什么句子呢

segmentationFaults · 2023-11-21T07:28:17Z

句子是1089_134686_000002_000000， prompt 是我随便找的一个语音

cantabile-kwok · 2023-11-21T07:37:32Z

@segmentationFaults 可否用提供的模型参数合成看看效果是否有区别呢

segmentationFaults · 2023-11-21T07:47:43Z

嗯，我试试看

danablend · 2023-11-24T20:34:49Z

Quick question:

How do you generate the CMVN file for new datasets? I've tried using extract_fbank.sh which uses utils/compute-cmvn-stats.py to generate a CMVN file, but the tensor I end up with has values orders of magnitude lower than the CMVN file you uploaded.

This is mine:

And this is yours:

I just ran extract_fbank.sh and it generated the CMVN file, but this doesn't seem quite right. Did you go through a different process? Thanks!

cantabile-kwok · 2023-11-25T12:48:12Z

@danablend I think that might be still correct. The CMVN process does not actually print the "mean" of each feature dimension. It computes the summation and sum-of-squares on each feature dimension. So, if the number of samples are different, the computed CMVN values can have orders of magnitudes of difference. May I ask how large is your dataset?

danablend · 2023-11-25T14:36:54Z

@cantabile-kwok Aha, that is good to know. My dataset was very small, only about 1000 audio samples total.

If I have a dataset with a different size from the dataset that you used to generate the CMVN.ark file, would this still work okay, or could this cause big issues? Thanks!

cantabile-kwok · 2023-11-26T08:20:23Z

@danablend That depends on whether you are training the model with this new dataset, or perform inference on this dataset.

If training on this dataset: then you should use new dataset. Estimating CMVN and normalizing from that newly-generated file is correct. This is because you want your training sample to have 0 mean and unit variance. However, note that a dataset with only 1000 samples are considered very small (I guess it is only 2 hours or so, right?), it is generally not recommended to do that, since the model relies on a good amount of data to learn well.
If inference on this dataset: then you should use the CMVN file which you used for training the dataset. In our case, it should be the CMVN file estimated on LibriTTS. This is because in this case, you assume the new dataset has the same distribution with the training dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于 prompt 梅尔谱的标准化 #8

关于 prompt 梅尔谱的标准化 #8

hopingZ commented Nov 3, 2023

cantabile-kwok commented Nov 4, 2023 •

edited

Loading

hopingZ commented Nov 4, 2023

segmentationFaults commented Nov 21, 2023

cantabile-kwok commented Nov 21, 2023

segmentationFaults commented Nov 21, 2023

cantabile-kwok commented Nov 21, 2023

segmentationFaults commented Nov 21, 2023

danablend commented Nov 24, 2023

cantabile-kwok commented Nov 25, 2023

danablend commented Nov 25, 2023

cantabile-kwok commented Nov 26, 2023 •

edited

Loading

关于 prompt 梅尔谱的标准化 #8

关于 prompt 梅尔谱的标准化 #8

Comments

hopingZ commented Nov 3, 2023

cantabile-kwok commented Nov 4, 2023 • edited Loading

hopingZ commented Nov 4, 2023

segmentationFaults commented Nov 21, 2023

cantabile-kwok commented Nov 21, 2023

segmentationFaults commented Nov 21, 2023

cantabile-kwok commented Nov 21, 2023

segmentationFaults commented Nov 21, 2023

danablend commented Nov 24, 2023

cantabile-kwok commented Nov 25, 2023

danablend commented Nov 25, 2023

cantabile-kwok commented Nov 26, 2023 • edited Loading

cantabile-kwok commented Nov 4, 2023 •

edited

Loading

cantabile-kwok commented Nov 26, 2023 •

edited

Loading