Skip to content

video-to-action/v2a-video-model-release

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video to Actions -- Video Models

Grounding Video Models to Actions through Goal Conditioned Exploration

[Project page] [Paper] [ArXiv]

Yunhao Luo1,2, Yilun Du3

1Georgia Tech, 2Brown, 3Harvard

This codebase contains code to train the video model in "Grounding Video Models to Actions through Goal Conditioned Exploration". For experiments in the robotic environments, please see video-to-action-release repo.

🛠️ Installation

The required conda environment is identical to the environment v2a_libero_release as described in the installation section of video-to-action-release.

🗃️ Download Libero Demonstrations

Please refer to Libero documentation link for the dataset downloading. Specifically, this codebase by default uses data from LIVING_ROOM_SCENE5 and LIVING_ROOM_SCENE6 in libero_100 dataset.
We provide a data preprossing script to prepare the data to train video models. If you would like to train video models on any other scenes, you can first preprocess the downloaded .hdf5 Libero data and modify the dataloader accordingly.

📦 Data Preprocessing

The downloaded Libero demonstrations are stored in .hdf5 file. We provide a script flowdiffusion/libero/lb_extract_imgs.py to extract the images. The commands are shown below. To successfully extract the data, correct Libero data file paths should be used. Please check out the note inside the script.

cd flowdiffusion
sh ./libero/lb_extract_imgs.sh

🕹️ Train a Model

With the extracted image data (which by default are stored in datasets/libero), you can now start training the video model.

Note that the training requires 4 GPUs, each with at least 22GB Memory.

To launch the training of the video model for Libero, run

cd flowdiffusion
sh libero/train_libero.sh

You can change the $config variable inside the script above to launch different experiments. You can refer to the default config in the script as template. Checkpoints will be saved in flowdiffusion/logs.

A pre-trained video model is provided here.

📊 Video Model Inference

We provide a script to sample from the video model given a path to an image file and a text condition. An example of a start observation image is in the examples folder.

cd flowdiffusion
sh libero/plan_libero.sh

🏷️ License

This repository is released under the MIT license. See LICENSE for additional details.

🙏 Acknowledgement

  • The implementation of this codebase is based on AVDC.

Contact Yunhao Luo if you have any questions or suggestions.

📝 Citations

If you find our work useful, please consider citing:

@misc{luo2024groundingvideomodelsactions,
      title={Grounding Video Models to Actions through Goal Conditioned Exploration}, 
      author={Yunhao Luo and Yilun Du},
      year={2024},
      eprint={2411.07223},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2411.07223}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published