[Project page] [Paper] [ArXiv]
Yunhao Luo1,2, Yilun Du3
1Georgia Tech, 2Brown, 3Harvard
This codebase contains code to train the video model in "Grounding Video Models to Actions through Goal Conditioned Exploration". For experiments in the robotic environments, please see video-to-action-release repo.
The required conda environment is identical to the environment v2a_libero_release
as described in the installation section of video-to-action-release.
Please refer to Libero documentation link for the dataset downloading.
Specifically, this codebase by default uses data from LIVING_ROOM_SCENE5
and LIVING_ROOM_SCENE6
in libero_100
dataset.
We provide a data preprossing script to prepare the data to train video models. If you would like to train video models on any other scenes, you can first preprocess the downloaded .hdf5
Libero data and modify the dataloader accordingly.
The downloaded Libero demonstrations are stored in .hdf5
file. We provide a script flowdiffusion/libero/lb_extract_imgs.py
to extract the images. The commands are shown below.
To successfully extract the data, correct Libero data file paths should be used. Please check out the note inside the script.
cd flowdiffusion
sh ./libero/lb_extract_imgs.sh
With the extracted image data (which by default are stored in datasets/libero
), you can now start training the video model.
Note that the training requires 4 GPUs, each with at least 22GB Memory.
To launch the training of the video model for Libero, run
cd flowdiffusion
sh libero/train_libero.sh
You can change the $config
variable inside the script above to launch different experiments. You can refer to the default config in the script as template. Checkpoints will be saved in flowdiffusion/logs
.
A pre-trained video model is provided here.
We provide a script to sample from the video model given a path to an image file and a text condition. An example of a start observation image is in the examples
folder.
cd flowdiffusion
sh libero/plan_libero.sh
This repository is released under the MIT license. See LICENSE for additional details.
- The implementation of this codebase is based on AVDC.
Contact Yunhao Luo if you have any questions or suggestions.
If you find our work useful, please consider citing:
@misc{luo2024groundingvideomodelsactions,
title={Grounding Video Models to Actions through Goal Conditioned Exploration},
author={Yunhao Luo and Yilun Du},
year={2024},
eprint={2411.07223},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2411.07223},
}