How does MeloTTS work? #240

yukiarimo · 2025-01-21T20:56:53Z

Hello community! I’m an ML engineer, but I am not experienced in audio/speech synthesis. Can somebody please explain how this works like I am 5? (Please, no papers.)

Is there a vocoder (Mel-to-audio converter) and a synthesizer (such as a Mel generator, GAN, or diffusion model)? Is converting audio to Mel just a simple Python script, or does it also involve an AI model? I understand it is based on Bert-VITS2, but I'm unfamiliar with that concept.

LLMs generate tokens one-by-one like the human brain (a.k.a. thinking), but why is this a non-autoregressive model? It generates all at once?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does MeloTTS work? #240

How does MeloTTS work? #240

yukiarimo commented Jan 21, 2025

How does MeloTTS work? #240

How does MeloTTS work? #240

Comments

yukiarimo commented Jan 21, 2025