Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The size of tensor a (4) must match the size of tensor b (8) #19

Open
G-force78 opened this issue Aug 3, 2023 · 2 comments
Open

The size of tensor a (4) must match the size of tensor b (8) #19

G-force78 opened this issue Aug 3, 2023 · 2 comments

Comments

@G-force78
Copy link

Using these arguments

!python3 /content/control-a-video/inference.py --prompt "a bear practicing kungfu, with a background of mountains" --input_video /content/kungfubear.mp4 --control_mode depth --num_sample_frames 24 --inference_step 10 --guidance_scale 5 --init_noise_thres 0.75

FPS 8 output demo.gif

_/content/control-a-video/inference.py:119 in │
│ │
│ 116 │
│ 117 out = [] │
│ 118 for i in range(num_sample_frames//each_sample_frame): │
│ ❱ 119 │ out1 = video_controlnet_pipe( │
│ 120 │ │ │ # controlnet_hint= control_maps[:,:,:each_sample_frame,:,: │
│ 121 │ │ │ # images= v2v_input_frames[:,:,:each_sample_frame,:,:], │
│ 122 │ │ │ controlnet_hint=control_maps[:,:,i*each_sample_frame-1:(i+ │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/autograd/grad_mode.py:27 in │
│ decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ ❱ 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def wrap_generator(self, func): │
│ │
│ /content/control-a-video/model/video_diffusion/pipelines/pipeline_stable_dif │
│ fusion_controlnet3d.py:418 in call
│ │
│ 415 │ │ │ │ │ if controlhint_in_uncond: │
│ 416 │ │ │ │ │ │ control_maps_single_frame = control_maps_singl │
│ 417 │ │ │ │ │ │
│ ❱ 418 │ │ │ │ │ down_block_res_samples_single_frame, mid_block_res │
│ 419 │ │ │ │ │ │ │ │ latent_model_input_single_frame, │
│ 420 │ │ │ │ │ │ │ │ t, │
│ 421 │ │ │ │ │ │ │ │ encoder_hidden_states=text_embeddings

│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1194 in │
│ _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self.forward_hooks or self.
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or global_backward_hooks: │
│ │
│ /content/control-a-video/model/video_diffusion/models/controlnet3d.py:464 in │
│ forward │
│ │
│ 461 │ │ controlnet_cond = self.controlnet_cond_embedding(controlnet_co │
│ 462 │ │ # print(sample.shape, controlnet_cond.shape) │
│ 463 │ │ │
│ ❱ 464 │ │ sample += controlnet_cond │
│ 465 │ │ # 3. down │
│ 466 │ │ │
│ 467 │ │ down_block_res_samples = (sample,)

@G-force78
Copy link
Author

G-force78 commented Aug 3, 2023

What is the relationship between fps, num_sample_frames and length of output video? Also what does --sampling_rate: skip sampling from the input video actually mean? I notice the default value is 3, what does this do?

Cool setup by the way its like an opensource version of runway Gen1 I imagine they used similar tricks and just have many GPU to run it.

@Weifeng-Chen
Copy link
Owner

the name should be fix. current may not be understood..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants