Skip to content

Commit

Permalink
Add HuggingFace and Openxlab gradio demos for SVC, TTA and TTS (#42)
Browse files Browse the repository at this point in the history
* update demo badges and links
* add svc and tta pretrained models badges
  • Loading branch information
RMSnow authored Dec 19, 2023
1 parent c0a1958 commit d76cf2d
Show file tree
Hide file tree
Showing 5 changed files with 18 additions and 4 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
<div>
<a href="https://arxiv.org/abs/2312.09911"><img src="https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg"></a>
<a href="https://huggingface.co/amphion"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Amphion-pink"></a>
<a href="https://openxlab.org.cn/usercenter/Amphion"><img src="https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg"></a>
<a href="egs/tts/README.md"><img src="https://img.shields.io/badge/README-TTS-blue"></a>
<a href="egs/svc/README.md"><img src="https://img.shields.io/badge/README-SVC-blue"></a>
<a href="egs/tta/README.md"><img src="https://img.shields.io/badge/README-TTA-blue"></a>
Expand Down Expand Up @@ -62,7 +63,7 @@ Here is the Amphion v0.1 demo, whose voice, audio effects, and singing voice are
- Flow-based vocoders: [WaveGlow](https://arxiv.org/abs/1811.00002).
- Diffusion-based vocoders: [Diffwave](https://arxiv.org/abs/2009.09761).
- Auto-regressive based vocoders: [WaveNet](https://arxiv.org/abs/1609.03499), [WaveRNN](https://arxiv.org/abs/1802.08435v1).
- Amphion provides the official implementation of [Multi-Scale Constant-Q Transform Discriminator](https://arxiv.org/abs/2311.14957). It can be used to enhance any architecture GAN-based vocoders during training, and keep the inference stage (such as memory or speed) unchanged. [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2311.14957) [![code](https://img.shields.io/badge/README-Code-red)](egs/vocoder/gan/tfr_enhanced_hifigan)
- Amphion provides the official implementation of [Multi-Scale Constant-Q Transform Discriminator](https://arxiv.org/abs/2311.14957) (our ICASSP 2024 paper). It can be used to enhance any architecture GAN-based vocoders during training, and keep the inference stage (such as memory or speed) unchanged. [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2311.14957) [![code](https://img.shields.io/badge/README-Code-red)](egs/vocoder/gan/tfr_enhanced_hifigan)

### Evaluation

Expand Down
3 changes: 3 additions & 0 deletions egs/svc/MultipleContentsSVC/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

[![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2310.11160)
[![demo](https://img.shields.io/badge/SVC-Demo-red)](https://www.zhangxueyao.com/data/MultipleContentsSVC/index.html)
[![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Models-pink)](https://huggingface.co/amphion/singing_voice_conversion)
[![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Spaces-yellow)](https://huggingface.co/spaces/amphion/singing_voice_conversion)
[![openxlab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/Amphion/singing_voice_conversion)

<br>
<div align="center">
Expand Down
6 changes: 6 additions & 0 deletions egs/tta/RECIPE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Text-to-Audio with Latent Diffusion Model

[![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2304.00830)
[![demo](https://img.shields.io/badge/SVC-Demo-red)](https://audit-demo.github.io/)
[![model](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Models-pink)](https://huggingface.co/amphion/text_to_audio)
[![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Spaces-yellow)](https://huggingface.co/spaces/amphion/Text-to-Audio)
[![openxlab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/Amphion/Text-to-Audio)

This is the quicktour for training a text-to-audio model with the popular and powerful generative model: [Latent Diffusion Model](https://arxiv.org/abs/2112.10752). Specially, this recipe is also the official implementation of the text-to-audio generation part of our NeurIPS 2023 paper "[AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models](https://arxiv.org/abs/2304.00830)". You can check the last part of [AUDIT demos](https://audit-demo.github.io/) to see same text-to-audio examples.

<br>
Expand Down
6 changes: 4 additions & 2 deletions egs/tts/NaturalSpeech2/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# NaturalSpeech2 Recipe

[![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Spaces-yellow)](https://huggingface.co/spaces/amphion/NaturalSpeech2)

In this recipe, we will show how to train [NaturalSpeech2](https://arxiv.org/abs/2304.09116) using Amphion's infrastructure. NaturalSpeech2 is a zero-shot TTS architecture that predicts latent representations of a neural audio codec.

There are three stages in total:

1. Data processing
3. Training
4. Inference
2. Training
3. Inference

> **NOTE:** You need to run every command of this recipe in the `Amphion` root path:
> ```bash
Expand Down
4 changes: 3 additions & 1 deletion egs/tts/VITS/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@

# VITS Recipe

[![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Spaces-yellow)](https://huggingface.co/spaces/amphion/Text-to-Speech)
[![openxlab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/Amphion/Text-to-Speech)

In this recipe, we will show how to train [VITS](https://arxiv.org/abs/2106.06103) using Amphion's infrastructure. VITS is an end-to-end TTS architecture that utilizes conditional variational autoencoder with adversarial learning.

There are four stages in total:
Expand Down

0 comments on commit d76cf2d

Please sign in to comment.