Video to Actions -- Video Models

Grounding Video Models to Actions through Goal Conditioned Exploration

¹Georgia Tech, ²Brown, ³Harvard

This codebase contains code to train the video model in "Grounding Video Models to Actions through Goal Conditioned Exploration". For experiments in the robotic environments, please see video-to-action-release repo.

🛠️ Installation

The required conda environment is identical to the environment v2a_libero_release as described in the installation section of video-to-action-release.

🗃️ Download Libero Demonstrations

Please refer to Libero documentation link for the dataset downloading. Specifically, this codebase by default uses data from LIVING_ROOM_SCENE5 and LIVING_ROOM_SCENE6 in libero_100 dataset.
We provide a data preprossing script to prepare the data to train video models. If you would like to train video models on any other scenes, you can first preprocess the downloaded .hdf5 Libero data and modify the dataloader accordingly.

📦 Data Preprocessing

The downloaded Libero demonstrations are stored in .hdf5 file. We provide a script flowdiffusion/libero/lb_extract_imgs.py to extract the images. The commands are shown below. To successfully extract the data, correct Libero data file paths should be used. Please check out the note inside the script.

cd flowdiffusion
sh ./libero/lb_extract_imgs.sh

🕹️ Train a Model

With the extracted image data (which by default are stored in datasets/libero), you can now start training the video model.

Note that the training requires 4 GPUs, each with at least 22GB Memory.

To launch the training of the video model for Libero, run

cd flowdiffusion
sh libero/train_libero.sh

You can change the $config variable inside the script above to launch different experiments. You can refer to the default config in the script as template. Checkpoints will be saved in flowdiffusion/logs.

A pre-trained video model is provided here.

📊 Video Model Inference

We provide a script to sample from the video model given a path to an image file and a text condition. An example of a start observation image is in the examples folder.

cd flowdiffusion
sh libero/plan_libero.sh

🏷️ License

This repository is released under the MIT license. See LICENSE for additional details.

🙏 Acknowledgement

The implementation of this codebase is based on AVDC.

Contact Yunhao Luo if you have any questions or suggestions.

📝 Citations

If you find our work useful, please consider citing:

@misc{luo2024groundingvideomodelsactions,
      title={Grounding Video Models to Actions through Goal Conditioned Exploration}, 
      author={Yunhao Luo and Yilun Du},
      year={2024},
      eprint={2411.07223},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2411.07223}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
datasets		datasets
examples		examples
flowdiffusion		flowdiffusion
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video to Actions -- Video Models

Grounding Video Models to Actions through Goal Conditioned Exploration

🛠️ Installation

🗃️ Download Libero Demonstrations

📦 Data Preprocessing

🕹️ Train a Model

📊 Video Model Inference

🏷️ License

🙏 Acknowledgement

📝 Citations

About

Releases

Packages

Languages

License

video-to-action/v2a-video-model-release

Folders and files

Latest commit

History

Repository files navigation

Video to Actions -- Video Models

Grounding Video Models to Actions through Goal Conditioned Exploration

🛠️ Installation

🗃️ Download Libero Demonstrations

📦 Data Preprocessing

🕹️ Train a Model

📊 Video Model Inference

🏷️ License

🙏 Acknowledgement

📝 Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages