HumanVLA

This repository contains the official implementation associated with the paper: HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid.

Advances in Neural Information Processing Systems 2024 (NeurIPS 2024)

[arXiv]

Installation

The codebase is developed under python3.8. We recommend using conda to create the environment

conda create -n humanvla python==3.8
conda activate humanvla

Then, install the official IsaacGym simulator from this website.

Last, install other dependencies by

pip install -r requirements.txt

Data

We use HITR dataset in this work. Please download it from this link.

To access assests of the HITR dataset, you need to download the HSSD dataset, replace HSSD_ROOT in data/cp_hssd.py and copy assets by

python data/cp_hssd.py

The prepared data directory will have the following structure:

----data
    |---- motions
    |---- assets
    |---- HITR_tasks
    |---- humanoid_assets

Play

Download our pretrained checkpoints from this link.

Then you can play with the humanoid by

# humanvla-teacher
python main.py --test --name play --force --num_envs 4 --cfg cfg/teacher_rearrangement.yaml --ckpt weights/humanvla_teacher.pth
# humanvla
python main.py --test --name play --force --num_envs 4 --cfg cfg/student_rearrangement.yaml --ckpt weights/humanvla.pth

Qualitative results are available in ./demo

Evaluation

You can evaluate the pretrained models with --eval args:

# humanvla-teacher
python main.py --test --eval --name play --force --cfg cfg/teacher_rearrangement.yaml --ckpt weights/humanvla_teacher.pth
# humanvla
python main.py --test --eval --name play --force --cfg cfg/student_rearrangement.yaml --ckpt weights/humanvla.pth

Train

The training process of HumanVLA consists of two stages, i.e. state-based teacher learning in stage 1 and vision-language directed policy distillation in stage 2.

Besides, the stage 1 consists of carry-curriculum pretraining and object rearrangement learning.

The holistic training process includes 3 subprocesses:

# carry
torchrun --standalone --nnode=1 --nproc_per_node=4 -m main --headless --cfg cfg/teacher_carry.yaml --name teacher_carry
# teacher
torchrun --standalone --nnode=1 --nproc_per_node=4 -m main --headless --cfg cfg/teacher_rearrangement.yaml --name teacher_rearrange
# humanvla
torchrun --standalone --nnode=1 --nproc_per_node=4 -m main --headless --cfg cfg/student_rearrangement.yaml --name student_rearrange

We use torchrun to enable DDP and launch training. You can modify GPUs settings according to your devices.

The training results will be saved in ./logs/[name] folder. Argument --force will override an existing folder.

Misc

We provide codes for referencing the generation processes in ./gen, including generating motions, waypoints, features, ...

Acknowledgement

We appreciate the efforts of following open-sourced works: ASE, HSSD, OMOMO and SAMP, which help the development of our work.

Citation

If you find our work useful, please cite:

@misc{xu2024humanvla,
      title={HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid}, 
      author={Xinyu Xu and Yizheng Zhang and Yong-Lu Li and Lei Han and Cewu Lu},
      year={2024},
      eprint={2406.19972},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2406.19972}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HumanVLA

Installation

Data

Play

Evaluation

Train

Misc

Acknowledgement

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cfg		cfg
data		data
demo		demo
envs		envs
gen		gen
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

AllenXuuu/HumanVLA

Folders and files

Latest commit

History

Repository files navigation

HumanVLA

Installation

Data

Play

Evaluation

Train

Misc

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages