Skip to content

Commit

Permalink
Update TTS readme (#224)
Browse files Browse the repository at this point in the history
  • Loading branch information
jiaqili3 authored Jun 25, 2024
1 parent f96a153 commit 5dfe9fd
Show file tree
Hide file tree
Showing 8 changed files with 4 additions and 4 deletions.
4 changes: 2 additions & 2 deletions egs/tts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@

## Quick Start

We provide a **[beginner recipe](VALLE/)** to demonstrate how to train a cutting edge TTS model. Specifically, it is Amphion's re-implementation for [Vall-E](https://arxiv.org/abs/2301.02111), which is a zero-shot TTS architecture that uses a neural codec language model with discrete codes.
We provide a **[beginner recipe](VALLE_V2/)** to demonstrate how to train a cutting edge TTS model. Specifically, it is Amphion's re-implementation for [VALL-E](https://arxiv.org/abs/2301.02111), which is a zero-shot TTS architecture that uses a neural codec language model with discrete codes.

## Supported Model Architectures

Until now, Amphion TTS supports the following models or architectures,
- **[FastSpeech2](FastSpeech2)**: A non-autoregressive TTS architecture that utilizes feed-forward Transformer blocks.
- **[VITS](VITS)**: An end-to-end TTS architecture that utilizes conditional variational autoencoder with adversarial learning
- **[Vall-E](VALLE)**: A zero-shot TTS architecture that uses a neural codec language model with discrete codes.
- **[VALL-E](VALLE_V2)**: A zero-shot TTS architecture that uses a neural codec language model with discrete codes. This model is our updated VALL-E implementation as of June 2024 which uses Llama as its underlying architecture. The previous version of VALL-E release can be found [here](VALLE)
- **[NaturalSpeech2](NaturalSpeech2)** (👨‍💻 developing): An architecture for TTS that utilizes a latent diffusion model to generate natural-sounding voices.

## Amphion TTS Demo
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions egs/tts/valle_v2/demo.ipynb → egs/tts/VALLE_V2/demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@
"source": [
"# put your cheackpoint file (.bin) in the root path of AmphionVALLEv2\n",
"# or use your own pretrained weights\n",
"ar_model_path = 'ckpts/valle_ar_mls_196000.bin' #huggingface-cli download jiaqili3/vallex valle_ar_mls_196000.bin valle_nar_mls_164000.bin --local-dir ckpts\n",
"ar_model_path = 'ckpts/valle_ar_mls_196000.bin' # huggingface-cli download amphion/valle valle_ar_mls_196000.bin valle_nar_mls_164000.bin --local-dir ckpts\n",
"nar_model_path = 'ckpts/valle_nar_mls_164000.bin'\n",
"speechtokenizer_path = 'ckpts/speechtokenizer_hubert_avg' # huggingface-cli download fnlp/SpeechTokenizer speechtokenizer_hubert_avg/SpeechTokenizer.pt speechtokenizer_hubert_avg/config.json --local-dir ckpts"
"speechtokenizer_path = 'ckpts/speechtokenizer_hubert_avg' # huggingface-cli download amphion/valle speechtokenizer_hubert_avg/SpeechTokenizer.pt speechtokenizer_hubert_avg/config.json --local-dir ckpts"
]
},
{
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit 5dfe9fd

Please sign in to comment.