Skip to content

HavenFeng/time_reversal

Repository files navigation

Time_Reversal_Fusion

TRF_teaser_figure

This is the official Pytorch implementation of Time Reversal Fusion (accepted at ECCV2024). We proposed a new sampling strategy called Time-Reversal Fusion (TRF), which enables the image-to-video model to generate sequences toward a given end frame without any tuning or back-propagated optimization. We define this new task as "Bounded Generation" and it generalizes three scenarios in computer vision:

  1. Generating subject motion with the two bound images capturing a moving subject.
  2. Synthesizing camera motion using two images captured from different viewpoints of a static scene.
  3. Achieving video looping by using the same image for both bounds.

Please refer to the arXiv paper for more technical details and Project Page for more video results.

Todo

  • TRF code release
  • Bounded Generation Dataset release
  • TRF++ (Domain specific lora patch for downstream tasks) release
  • Gradio demo

Getting Started

Clone the repo:

git clone https://github.com/HavenFeng/time_reversal/
cd time_reveral

Requirements

  • Python 3.10 (numpy, skimage, scipy, opencv)
  • Diffusers
  • PyTorch >= 2.0.1 (Diffusers compatible)
    You can run
    pip install -r requirements.txt
    If you encountered errors when installing Diffusers, please follow the official installation guide to re-install the library.

Usage

  1. Run inference with samples in paper
    python svd_sequential_re.py multiview
    Check different task results with "multiview", "video frames", "gym_motion" and "image2loop", the generated results can be found in the ./output folder.
  2. TRF++ (add LoRA "patches" to enhance domain-specific task)
    TRF was designed to probe SVD's bounded generation capabilities without fine-tuning, but we've observed SVD's biases in subject and camera motion, as well as sensitivity to conditioning factors like FPS and motion intensity. These required careful parameter tuning for different inputs. To improve generation quality and robustness for other downstream tasks, we fine-tuned LoRA "patch" on various domain-specific datasets, better supporting long-range linear motion and extreme 3D views generation.
     coming soon
    

Evaluation

We evaluate our methods with the Bounded Generation Dataset compared to the domain-specific state-of-the-art methods.
For more details of the evaluation, please check our arXiv paper.

Citation

If you find our work useful to your research, please consider citing:

@inproceedings{Feng:TRF:ECCV2024,
  title = {Explorative In-betweening of Time and Space}, 
  author = {Feng, Haiwen and Ding, Zheng and Xia, Zhihao and Niklaus, Simon and Abrevaya, Victoria and Black, Michael J. and Zhang Xuaner}, 
  booktitle = {European Conference on Computer Vision}, 
  year = {2024}
}

Notes

The video form of of our teaser image:

TRF_teaser_video.mp4

More domain-specific lora patch models will be released soon

License

This code and model are available for non-commercial scientific research purposes.

Acknowledgements

We would like to thank recent baseline works that allow us to easily perform quantitative and qualitative comparisons :)
FILM, Wide-Baseline, Text2Cinemagraph,

This work was partly supported by the German Federal Ministry of Education and Research (BMBF): Tuebingen AI Center, FKZ: 01IS18039B

About

Official repo of "Time Reversal Fusion" (ECCV2024)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages