Skip to content

deepbeepmeep/HunyuanVideoGP

 
 

Repository files navigation

中文阅读

HunyuanVideo: A Systematic Framework For Large Video Generation Model

       

Replicate

01/11/2025: Version 3.0 Multiple prompts / multiple generations per prompt, new progression bar, support for pretrained Loras
01/06/2025: Version 2.1 Integrated Tea Cache (https://github.com/ali-vilab/TeaCache) for even faster generations
01/04/2025: Version 2.0 Full leverage of mmgp 3.0 (faster and even lower RAM requirements ! + support for compilation on Linux and WSL)
12/22/2024: Version 1.0 First release\

GPU Poor version by DeepBeepMeep. This great video generator can now run smoothly on a 12 GB to 24 GB GPU.

This version has the following improvements over the original Hunyuan Video model:

  • Reduce greatly the RAM requirements and VRAM requirements
  • Much faster thanks to compilation and fast loading / unloading
  • 5 profiles in order to able to run the model at a decent speed on a low end consumer config (32 GB of RAM and 12 VRAM) and to run it at a very good speed on a high end consumer config (48 GB of RAM and 24 GB of VRAM)
  • Autodownloading of the needed model files
  • Improved gradio interface with progression bar and more options
  • Multiples prompts / multiple generations per prompt
  • Support multiple pretrained Loras with 32 GB of RAM or less
  • Switch easily between Hunyuan and Fast Hunyuan models and quantized / non quantized models
  • Much simpler installation

This fork by DeepBeepMeep is an integration of the mmpg module on the gradio_server.py.

It is an illustration on how one can set up on an existing model some fast and properly working CPU offloading with changing only a few lines of code in the core model.

For more information on how to use the mmpg module, please go to: https://github.com/deepbeepmeep/mmgp

You will find the original Hunyuan Video repository here: https://github.com/Tencent/HunyuanVideo

Installation Guide for Linux and Windows

We provide an environment.yml file for setting up a Conda environment. Conda's installation instructions are available here.

We recommend CUDA versions 12.4 or 11.8 for the manual installation.

# 1. Prepare conda environment
conda env create -f environment.yml

# 2. Activate the environment
conda activate HunyuanVideo

# 3. Install pip dependencies
python -m pip install -r requirements.txt


# 4.1 optional Flash attention support (easy to install on Linux but much harder on Windows)
python -m pip install flash-attn==2.7.2.post1

# 4.2 optional Sage attention support (30% faster, easy to install on Linux but much harder on Windows)
python -m pip install sageattention==1.0.6 

Note that Flash attention and Sage attention are quite complex to install on Windows but offers a better memory management (and consequently longer videos) than the default sdap attention. Likewise Pytorch Compilation will work on Windows only if you manage to install Triton. It is quite a complex process I will try to provide a script in the future.

Profiles

You can choose between 5 profiles depending on your hardware:

  • HighRAM_HighVRAM (1): at least 48 GB of RAM and 24 GB of VRAM : the fastest well suited for a RTX 3090 / RTX 4090 but consumes much more VRAM, adapted for fast shorter video
  • HighRAM_LowVRAM (2): at least 48 GB of RAM and 12 GB of VRAM : a bit slower, better suited for RTX 3070/3080/4070/4080 or for RTX 3090 / RTX 4090 with large pictures batches or long videos
  • LowRAM_HighVRAM (3): at least 32 GB of RAM and 24 GB of VRAM : adapted for RTX 3090 / RTX 4090 with limited RAM but at the cost of VRAM (shorter videos)
  • LowRAM_LowVRAM (4): at least 32 GB of RAM and 12 GB of VRAM : if you have little VRAM or want to generate longer videos
  • VerylowRAM_LowVRAM (5): at least 24 GB of RAM and 10 GB of VRAM : if you don't have much it won't be fast but maybe it will work

Profile 2 (High RAM) and 4 (Low RAM)are the most recommended profiles since they are versatile (support for long videos for a slight performance cost).
However, a safe approach is to start from profile 5 (default profile) and then go down progressively to profile 4 and then to profile 2 as long as the app remains responsive or doesn't trigger any out of memory error.

Run a Gradio Server on port 7860 (recommended)

python3 gradio_server.py

You will have the possibility to configure a RAM / VRAM profile by expanding the section Video Engine Configuration in the Web Interface.
If by mistake you have chosen a configuration not supported by your system, you can force a profile while loading the app with the safe profile 5:

python3 gradio_server.py --profile 5

Do the following to load a prequantized Lora:

python3 gradio_server.py --lora-weight lora.safetensors --lora-multiplier 1

You can find prebuilt Loras on https://civitai.com/ or build them with tools such kohya or onetrainer.

Command line parameters for Gradio Server

--profile no : default (5) : no of profile between 1 and 5
--quantize-transformer bool: (default True) : enable / disable on the fly transformer quantization
--lora-weight path1 path2 ... : list of Loras Path
--lora-multiplier float mult1 mult2 ... : list of relative weights for each Lora. The corresponding Lora file must be in the diffusers format.
--verbose level : default (1) : level of information between 0 and 2
--server-port portno : default (7860) : Gradio port no
--server-name name : default (0.0.0.0) : Gradio server name
--open-browser : open automatically Browser when launching Gradio Server\

Run through the command line

cd HunyuanVideo

python3 sample_video.py \
    --video-size 720 1280 \
    --video-length 129 \
    --infer-steps 50 \
    --prompt "A cat walks on the grass, realistic style." \
    --flow-reverse \
    --save-path ./results

Please note currently that profile and the models used need to be mentioned inside the sample_video.py file.

More Configurations for Gradio Server and Command line

We list some more useful configurations for easy usage:

Argument Default Description
--prompt None The text prompt for video generation
--video-size 720 1280 The size of the generated video
--video-length 129 The length of the generated video
--infer-steps 50 The number of steps for sampling
--embedded-cfg-scale 6.0 Embeded Classifier free guidance scale
--flow-shift 7.0 Shift factor for flow matching schedulers
--flow-reverse False If reverse, learning/sampling from t=1 -> t=0
--seed None The random seed for generating video, if None, we init a random seed
--use-cpu-offload False Use CPU offload for the model load to save more memory, necessary for high-res video generation
--save-path ./results Path to save the generated video

Abstract

We present HunyuanVideo, a novel open-source video foundation model that exhibits performance in video generation that is comparable to, if not superior to, leading closed-source models. In order to train HunyuanVideo model, we adopt several key technologies for model learning, including data curation, image-video joint model training, and an efficient infrastructure designed to facilitate large-scale model training and inference. Additionally, through an effective strategy for scaling model architecture and dataset, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models.

We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion diversity, text-video alignment, and generation stability. According to professional human evaluation results, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and 3 top-performing Chinese video generative models. By releasing the code and weights of the foundation model and its applications, we aim to bridge the gap between closed-source and open-source video foundation models. This initiative will empower everyone in the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem.

🔗 BibTeX

If you find HunyuanVideo useful for your research and applications, please cite using this BibTeX:

@misc{kong2024hunyuanvideo,
      title={HunyuanVideo: A Systematic Framework For Large Video Generative Models}, 
      author={Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, Aladdin Wang, Andong Wang, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Hongmei Wang, Jacob Song, Jiawang Bai, Jianbing Wu, Jinbao Xue, Joey Wang, Junkun Yuan, Kai Wang, Mengyang Liu, Pengyu Li, Shuai Li, Weiyan Wang, Wenqing Yu, Xinchi Deng, Yang Li, Yanxin Long, Yi Chen, Yutao Cui, Yuanbo Peng, Zhentao Yu, Zhiyu He, Zhiyong Xu, Zixiang Zhou, Zunnan Xu, Yangyu Tao, Qinglin Lu, Songtao Liu, Dax Zhou, Hongfa Wang, Yong Yang, Di Wang, Yuhong Liu, and Jie Jiang, along with Caesar Zhong},
      year={2024},
      archivePrefix={arXiv preprint arXiv:2412.03603},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.03603}, 
}

🧩 Projects that use HunyuanVideo

If you develop/use HunyuanVideo in your projects, welcome to let us know.

Acknowledgements

We would like to thank the contributors to the SD3, FLUX, Llama, LLaVA, Xtuner, diffusers and HuggingFace repositories, for their open research and exploration. Additionally, we also thank the Tencent Hunyuan Multimodal team for their help with the text encoder.

Star History

Star History Chart

About

HunyuanVideo GP: Large Video Generation Model - GPU Poor version

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Shell 0.8%