diff --git a/README.md b/README.md
index 95a65ab..19dfb22 100644
--- a/README.md
+++ b/README.md
@@ -2,6 +2,9 @@
 [![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Demo-blue)](https://huggingface.co/spaces/Plachta/Seed-VC)  [![arXiv](https://img.shields.io/badge/arXiv-2411.09943-<COLOR>.svg)](https://arxiv.org/abs/2411.09943)
 
 *English | [简体中文](README-ZH.md) | [日本語](README-JA.md)*  
+
+<video src="./assets/real-time-demo.webm"></video>
+
 Currently released model supports *zero-shot voice conversion* 🔊 , *zero-shot real-time voice conversion* 🗣️ and *zero-shot singing voice conversion* 🎶. Without any training, it is able to clone a voice given a reference speech of 1~30 seconds.  
 
 We support further fine-tuning on custom data to increase performance on specific speaker/speakers, with extremely low data requirement **(minimum 1 utterance per speaker)** and extremely fast training speed **(minimum 100 steps, 2 min on T4)**!
diff --git a/assets/real-time-demo.webm b/assets/real-time-demo.webm
new file mode 100644
index 0000000..adb079a
Binary files /dev/null and b/assets/real-time-demo.webm differ