Skip to content

Commit

Permalink
Update 7b && Support low vram inference (#196)
Browse files Browse the repository at this point in the history
---------

Co-authored-by: bubbliiiing <[email protected]>
  • Loading branch information
hkunzhe and bubbliiiing authored Feb 12, 2025
1 parent f74a6cb commit 9dbb4f8
Show file tree
Hide file tree
Showing 32 changed files with 500 additions and 146 deletions.
17 changes: 11 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,22 +117,19 @@ We need about 60GB available on disk (for saving weights), please check!
The video size for EasyAnimateV5.1-12B can be generated by different GPU Memory, including:
| GPU memory | 384x672x25 | 384x672x49 | 576x1008x25 | 576x1008x49 | 768x1344x25 | 768x1344x49 |
|------------|------------|------------|------------|------------|------------|------------|
| 16GB | 🧡 | 🧡 | | |||
| 16GB | 🧡 | ⭕️ | ⭕️ | ⭕️ |||
| 24GB | 🧡 | 🧡 | 🧡 | 🧡 | 🧡 ||
| 40GB |||||||
| 80GB |||||||

Due to the float16 weights of qwen2-vl-7b, it cannot run on a 16GB GPU. If your GPU memory is 16GB, please visit [Huggingface](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8) or [Modelscope](https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8) to download the quantized version of qwen2-vl-7b to replace the original text encoder, and install the corresponding dependency libraries (auto-gptq, optimum).

The video size for EasyAnimateV5-7B can be generated by different GPU Memory, including:
The video size for EasyAnimateV5.1-7B can be generated by different GPU Memory, including:
| GPU memory |384x672x25|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
|----------|----------|----------|----------|----------|----------|----------|
| 16GB | 🧡 | 🧡 | | |||
| 16GB | 🧡 | 🧡 | ⭕️ | ⭕️ |||
| 24GB |||| 🧡 | 🧡 ||
| 40GB |||||||
| 80GB |||||||


✅ indicates it can run under "model_cpu_offload", 🧡 represents it can run under "model_cpu_offload_and_qfloat8", ⭕️ indicates it can run under "sequential_cpu_offload", ❌ means it can't run. Please note that running with sequential_cpu_offload will be slower.

Some GPUs that do not support torch.bfloat16, such as 2080ti and V100, require changing the weight_dtype in app.py and predict files to torch.float16 in order to run.
Expand Down Expand Up @@ -501,6 +498,14 @@ For details on setting some parameters, please refer to [Readme Train](scripts/R

EasyAnimateV5.1:

7B:
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
|--|--|--|--|--|--|
| EasyAnimateV5.1-7b-zh-InP | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
| EasyAnimateV5.1-7b-zh-Control | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control) | Official video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, and trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
| EasyAnimateV5.1-7b-zh-Control-Camera | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control-Camera) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control-Camera) | Official video camera control weights, supporting direction generation control by inputting camera motion trajectories. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
| EasyAnimateV5.1-7b-zh | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh) | Official text-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |

12B:
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
|--|--|--|--|--|--|
Expand Down
8 changes: 3 additions & 5 deletions README_ja-JP.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,17 +118,15 @@ Linuxの詳細:
EasyAnimateV5.1-12Bのビデオサイズは異なるGPUメモリにより生成できます。以下の表をご覧ください:
| GPUメモリ |384x672x25|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
|----------|----------|----------|----------|----------|----------|----------|
| 16GB | 🧡 | 🧡 | | |||
| 16GB | 🧡 | ⭕️ | ⭕️ | ⭕️ |||
| 24GB | 🧡 | 🧡 | 🧡 | 🧡 | 🧡 ||
| 40GB |||||||
| 80GB |||||||

qwen2-vl-7bのfloat16の重みのため、16GBのVRAMでは実行できません。もしお使いのVRAMが16GBである場合は、[Huggingface](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-

EasyAnimateV5-7Bのビデオサイズは異なるGPUメモリにより生成できます。以下の表をご覧ください:
EasyAnimateV5.1-7Bのビデオサイズは異なるGPUメモリにより生成できます。以下の表をご覧ください:
| GPU memory |384x672x25|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
|----------|----------|----------|----------|----------|----------|----------|
| 16GB | 🧡 | 🧡 | | |||
| 16GB | 🧡 | 🧡 | ⭕️ | ⭕️ |||
| 24GB |||| 🧡 | 🧡 ||
| 40GB |||||||
| 80GB |||||||
Expand Down
16 changes: 11 additions & 5 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,17 +115,15 @@ Linux 的详细信息:
EasyAnimateV5.1-12B的视频大小可以由不同的GPU Memory生成,包括:
| GPU memory |384x672x25|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
|----------|----------|----------|----------|----------|----------|----------|
| 16GB | 🧡 | 🧡 | | |||
| 16GB | 🧡 | ⭕️ | ⭕️ | ⭕️ |||
| 24GB | 🧡 | 🧡 | 🧡 | 🧡 | 🧡 ||
| 40GB |||||||
| 80GB |||||||

由于qwen2-vl-7b的float16的权重,无法在16GB显存下运行,如果您的显存是16GB,请前往[Huggingface](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8)或者[Modelscope](https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8)下载量化后的qwen2-vl-7b对原有的text encoder进行替换,并安装对应的依赖库(auto-gptq, optimum)。

EasyAnimateV5-7B的视频大小可以由不同的GPU Memory生成,包括:
EasyAnimateV5.1-7B的视频大小可以由不同的GPU Memory生成,包括:
| GPU memory |384x672x25|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
|----------|----------|----------|----------|----------|----------|----------|
| 16GB | 🧡 | 🧡 | | |||
| 16GB | 🧡 | 🧡 | ⭕️ | ⭕️ |||
| 24GB |||| 🧡 | 🧡 ||
| 40GB |||||||
| 80GB |||||||
Expand Down Expand Up @@ -495,6 +493,14 @@ sh scripts/train.sh
# 模型地址
EasyAnimateV5.1:

7B:
| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
|--|--|--|--|--|--|
| EasyAnimateV5.1-7b-zh-InP | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-InP)| 官方的图生视频权重。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持多语言预测 |
| EasyAnimateV5.1-7b-zh-Control | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control)| 官方的视频控制权重,支持不同的控制条件,如Canny、Depth、Pose、MLSD等,同时支持使用轨迹控制。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持多语言预测 |
| EasyAnimateV5.1-7b-zh-Control-Camera | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control-Camera) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control-Camera)| 官方的视频相机控制权重,支持通过输入相机运动轨迹控制生成方向。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持多语言预测 |
| EasyAnimateV5.1-7b-zh | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh)| 官方的文生视频权重。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持多语言预测 |

12B:
| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
|--|--|--|--|--|--|
Expand Down
5 changes: 2 additions & 3 deletions app.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,13 @@
# resulting in slower speeds but saving a large amount of GPU memory.
#
# EasyAnimateV1, V2 and V3 support "model_cpu_offload" "sequential_cpu_offload"
# EasyAnimateV4, V5 support "model_cpu_offload" "model_cpu_offload_and_qfloat8" "sequential_cpu_offload"
# EasyAnimateV5.1 support "model_cpu_offload" "model_cpu_offload_and_qfloat8"
# EasyAnimateV4, V5 and V5.1 support "model_cpu_offload" "model_cpu_offload_and_qfloat8" "sequential_cpu_offload"
GPU_memory_mode = "model_cpu_offload_and_qfloat8"
# EasyAnimateV5.1 support TeaCache.
enable_teacache = True
# Recommended to be set between 0.05 and 0.1. A larger threshold can cache more steps, speeding up the inference process,
# but it may cause slight differences between the generated content and the original content.
teacache_threshold = 0.1
teacache_threshold = 0.08
# Use torch.float16 if GPU does not support torch.bfloat16
# ome graphics cards, such as v100, 2080ti, do not support torch.bfloat16
weight_dtype = torch.bfloat16
Expand Down
8 changes: 8 additions & 0 deletions comfyui/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,14 @@ pip install -r comfyui/requirements.txt

EasyAnimateV5.1:

7B:
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
|--|--|--|--|--|--|
| EasyAnimateV5.1-7b-zh-InP | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
| EasyAnimateV5.1-7b-zh-Control | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control) | Official video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, and trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
| EasyAnimateV5.1-7b-zh-Control-Camera | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control-Camera) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control-Camera) | Official video camera control weights, supporting direction generation control by inputting camera motion trajectories. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
| EasyAnimateV5.1-7b-zh | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh) | Official text-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |

12B:
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
|--|--|--|--|--|--|
Expand Down
9 changes: 9 additions & 0 deletions comfyui/README_zh-CN.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,15 @@ pip install -r comfyui/requirements.txt
## 将模型下载到`ComfyUI/models/EasyAnimate/`

EasyAnimateV5.1:

7B:
| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
|--|--|--|--|--|--|
| EasyAnimateV5.1-7b-zh-InP | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-InP)| 官方的图生视频权重。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持多语言预测 |
| EasyAnimateV5.1-7b-zh-Control | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control)| 官方的视频控制权重,支持不同的控制条件,如Canny、Depth、Pose、MLSD等,同时支持使用轨迹控制。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持多语言预测 |
| EasyAnimateV5.1-7b-zh-Control-Camera | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control-Camera) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control-Camera)| 官方的视频相机控制权重,支持通过输入相机运动轨迹控制生成方向。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持多语言预测 |
| EasyAnimateV5.1-7b-zh | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh)| 官方的文生视频权重。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持多语言预测 |

12B:
|名称|类型|存储空间|拥抱面|型号范围|描述|
|--|--|--|--|--|--|
Expand Down
26 changes: 22 additions & 4 deletions comfyui/comfyui_nodes.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@
from ..easyanimate.utils.lora_utils import merge_lora, unmerge_lora
from ..easyanimate.utils.utils import (get_image_to_video_latent, get_image_latent,
get_video_to_video_latent)
from ..easyanimate.utils.fp8_optimization import convert_weight_dtype_wrapper
from ..easyanimate.utils.fp8_optimization import (convert_model_weight_to_float8,
convert_weight_dtype_wrapper)
from ..easyanimate.ui.ui import ddpm_scheduler_dict, flow_scheduler_dict, all_cheduler_dict

# Compatible with Alibaba EAS for quick launch
Expand Down Expand Up @@ -98,6 +99,11 @@ def INPUT_TYPES(s):
'EasyAnimateV5-12b-zh-InP',
'EasyAnimateV5-12b-zh-Control',
'EasyAnimateV5-12b-zh',
'EasyAnimateV5.1-7b-zh',
'EasyAnimateV5.1-7b-zh-InP',
'EasyAnimateV5.1-7b-zh-Control',
'EasyAnimateV5.1-7b-zh-Control-Camera',
'EasyAnimateV5.1-12b-zh',
'EasyAnimateV5.1-12b-zh-InP',
'EasyAnimateV5.1-12b-zh-Control',
'EasyAnimateV5.1-12b-zh-Control-Camera',
Expand Down Expand Up @@ -174,7 +180,7 @@ def loadmodel(self, GPU_memory_mode, model, precision, model_type, config):
model_name,
subfolder="vae"
).to(weight_dtype)
if config['vae_kwargs'].get('vae_type', 'AutoencoderKL') == 'AutoencoderKLMagvit' and weight_dtype == torch.float16:
if weight_dtype == torch.float16 and "v5.1" not in model_name.lower():
vae.upcast_vae = True
# Update pbar
pbar.update(1)
Expand All @@ -185,7 +191,7 @@ def loadmodel(self, GPU_memory_mode, model, precision, model_type, config):
]

transformer_additional_kwargs = OmegaConf.to_container(config['transformer_additional_kwargs'])
if weight_dtype == torch.float16:
if weight_dtype == torch.float16 and "v5.1" not in model_name.lower():
transformer_additional_kwargs["upcast_attention"] = True

transformer = Choosen_Transformer3DModel.from_pretrained_2d(
Expand Down Expand Up @@ -299,11 +305,23 @@ def loadmodel(self, GPU_memory_mode, model, precision, model_type, config):
transformer=transformer,
scheduler=scheduler,
)

if GPU_memory_mode == "sequential_cpu_offload":
pipeline._manual_cpu_offload_in_sequential_cpu_offload = []
for name, _text_encoder in zip(["text_encoder", "text_encoder_2"], [pipeline.text_encoder, pipeline.text_encoder_2]):
if isinstance(_text_encoder, Qwen2VLForConditionalGeneration):
if hasattr(_text_encoder, "visual"):
del _text_encoder.visual
convert_model_weight_to_float8(_text_encoder)
convert_weight_dtype_wrapper(_text_encoder, weight_dtype)
pipeline._manual_cpu_offload_in_sequential_cpu_offload = [name]
pipeline.enable_sequential_cpu_offload()
elif GPU_memory_mode == "model_cpu_offload_and_qfloat8":
pipeline.enable_model_cpu_offload()
for _text_encoder in [pipeline.text_encoder, pipeline.text_encoder_2]:
if hasattr(_text_encoder, "visual"):
del _text_encoder.visual
convert_weight_dtype_wrapper(transformer, weight_dtype)
pipeline.enable_model_cpu_offload()
else:
pipeline.enable_model_cpu_offload()
easyanimate_model = {
Expand Down
2 changes: 1 addition & 1 deletion comfyui/v5.1/easyanimatev5.1_workflow_control_camera.json
Original file line number Diff line number Diff line change
Expand Up @@ -547,7 +547,7 @@
6,
1,
"Flow",
0.10,
0.08,
true,
""
]
Expand Down
4 changes: 2 additions & 2 deletions comfyui/v5.1/easyanimatev5.1_workflow_control_trajectory.json
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@
"Node name for S&R": "EasyAnimate_TextBox"
},
"widgets_values": [
"一只棕褐色的狗在摇晃脑袋,坐在一个舒适的房间里的浅色沙发上。在狗的后面,架子上有一幅镶框的画,周围是粉红色的花朵。房间里的灯光柔和温暖,营造出舒适的氛围"
"一只棕褐色的狗正摇晃着脑袋,坐在一个舒适的房间里的浅色沙发上。沙发看起来柔软而宽敞,为这只活泼的狗狗提供了一个完美的休息地点。在狗的后面,靠墙摆放着一个架子,架子上挂着一幅精美的镶框画,画中描绘着一些美丽的风景或场景。画框周围装饰着粉红色的花朵,这些花朵不仅增添了房间的色彩,还带来了一丝自然和生机。房间里的灯光柔和而温暖,从天花板上的吊灯和角落里的台灯散发出来,营造出一种温馨舒适的氛围。整个空间给人一种宁静和谐的感觉,仿佛时间在这里变得缓慢而美好"
]
},
{
Expand Down Expand Up @@ -776,7 +776,7 @@
6,
1,
"Flow",
0.10,
0.08,
true,
""
]
Expand Down
2 changes: 1 addition & 1 deletion comfyui/v5.1/easyanimatev5.1_workflow_i2v.json
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@
50,
6,
"Flow",
0.10,
0.08,
true
]
},
Expand Down
Loading

0 comments on commit 9dbb4f8

Please sign in to comment.