You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is the relationship between fps, num_sample_frames and length of output video? Also what does --sampling_rate: skip sampling from the input video actually mean? I notice the default value is 3, what does this do?
Cool setup by the way its like an opensource version of runway Gen1 I imagine they used similar tricks and just have many GPU to run it.
Using these arguments
!python3 /content/control-a-video/inference.py --prompt "a bear practicing kungfu, with a background of mountains" --input_video /content/kungfubear.mp4 --control_mode depth --num_sample_frames 24 --inference_step 10 --guidance_scale 5 --init_noise_thres 0.75
FPS 8 output demo.gif
_/content/control-a-video/inference.py:119 in │
│ │
│ 116 │
│ 117 out = [] │
│ 118 for i in range(num_sample_frames//each_sample_frame): │
│ ❱ 119 │ out1 = video_controlnet_pipe( │
│ 120 │ │ │ # controlnet_hint= control_maps[:,:,:each_sample_frame,:,: │
│ 121 │ │ │ # images= v2v_input_frames[:,:,:each_sample_frame,:,:], │
│ 122 │ │ │ controlnet_hint=control_maps[:,:,i*each_sample_frame-1:(i+ │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/autograd/grad_mode.py:27 in │
│ decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ ❱ 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def wrap_generator(self, func): │
│ │
│ /content/control-a-video/model/video_diffusion/pipelines/pipeline_stable_dif │
│ fusion_controlnet3d.py:418 in call │
│ │
│ 415 │ │ │ │ │ if controlhint_in_uncond: │
│ 416 │ │ │ │ │ │ control_maps_single_frame = control_maps_singl │
│ 417 │ │ │ │ │ │
│ ❱ 418 │ │ │ │ │ down_block_res_samples_single_frame, mid_block_res │
│ 419 │ │ │ │ │ │ │ │ latent_model_input_single_frame, │
│ 420 │ │ │ │ │ │ │ │ t, │
│ 421 │ │ │ │ │ │ │ │ encoder_hidden_states=text_embeddings │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1194 in │
│ _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self.forward_hooks or self. │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or global_backward_hooks: │
│ │
│ /content/control-a-video/model/video_diffusion/models/controlnet3d.py:464 in │
│ forward │
│ │
│ 461 │ │ controlnet_cond = self.controlnet_cond_embedding(controlnet_co │
│ 462 │ │ # print(sample.shape, controlnet_cond.shape) │
│ 463 │ │ │
│ ❱ 464 │ │ sample += controlnet_cond │
│ 465 │ │ # 3. down │
│ 466 │ │ │
│ 467 │ │ down_block_res_samples = (sample,)
The text was updated successfully, but these errors were encountered: