Learning Options via Compression (Love)

This repository contains the source code for "Learning Options via Compression" presented at NeurIPS 2022.

Yiding Jiang*, Evan Zheran Liu*, Benjamin Eysenbach, J. Zico Kolter, Chelsea Finn

Setup

This code requires Python3.7+. Install the dependencies in requirements.txt. We recommend using a virtualenv:

python3 -m virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Additionally, download the datafiles from this Google Drive link and place them in the root love/ directory. Then, extract them:

tar -xvf compile.tar.gz
tar -xvf world3d.tar.gz
tar -xvf checkpoints.tar.gz

Hierarchical Reinforcement Learning Experiments

The hierarchical reinforcement learning experiments on the multi-task grid world and 3D domain consist of two phases: 1) extracting skills from the demonstrations and 2) using those skills to learn new tasks. The first phase is implemented in train_rl.py, while the second phase is implemented in dqn/main.py. Below, we detail commands that reproduce the results in the paper for these two phases respectively.

Extracting learned skills

The following commands extract learned skills with LOVE on the grid world domain and the 3D domain respectively:

PYOPENGL_PLATFORM=egl python train_rl.py \
    --name=grid_world_love \
    --coding_len_coeff=0.005 \
    --kl_coeff=0.0 \
    --rec_coeff=1.0 \
    --use_abs_pos_kl=1.0 \
    --batch-size=64 \
    --dataset-path=compile  \
    --max-iters=20000  \
    --state-size=8 \
    --use_min_length_boundary_mask \
    --latent-n=10

PYOPENGL_PLATFORM=egl python train_rl.py \
    --name=3d_love \
    --coding_len_coeff=0.001 \
    --kl_coeff=0.0 \
    --rec_coeff=1.0 \
    --use_abs_pos_kl=1.0 \
    --batch-size=64 \
    --dataset-path=miniworld  \
    --use_min_length_boundary_mask \
    --max-iters=20000 \
    --state-size=64 \
    --use_min_length_boundary_mask \
    --learn-rate=0.0001

To use VTA instead, set ---coding_len_coeff=0 and --kl_coeff=0.05. To use DDO instead, set ---coding_len_coeff=0, --kl_coeff=1, and add the --ddo flag. We also recommend changing the experiment name in the --name flag, though this is optional. The extracted skills are saved in the model checkpoints under experiments/{name}/model-{step}.ckpt, where {name} is the argument provided to the --name flag. We select the checkpoint with the best LOVE objective for LOVE and with the best ELBO for the others, as described in the paper. These objectives can be monitored via wandb by supplying the --wandb flag and changing the entity argument in the wandb.init call in train_rl.py to your own wandb username.

Learning new tasks with the learned skills

To learn new tasks with the skills extracted above, invoke dqn/main.py setting -b checkpoint=\"{checkpoint_path}\" to point at a model checkpoint from above. For example, the following commands reproduce the results from the paper with LOVE skills on the grid world and 3D domains respectively, using released checkpoints.

First, set the PYTHONPATH to be the root of this directory:

export PYTHONPATH=/path/to/love

Then run one of these commands:

PYOPENGL_PLATFORM=egl python3 dqn/main.py love_grid_world -b agent.policy.epsilon_schedule.total_steps=500000 -b checkpoint=\"checkpoints/love_grid_world.ckpt\" -b threshold=0 -b sparse_reward=True -b visit_length=3 -b bc=False -b oracle=False --seed 0 -b env=\"compile\"

PYOPENGL_PLATFORM=egl python3 dqn/main.py love_3d -b agent.sync_target_freq=30000 -b agent.policy.epsilon_schedule.total_steps=250000 -b checkpoint=\"checkpoints/love_3d.ckpt\" -b threshold=0.05 -b sparse_reward=True -b visit_length=3 -b bc=False -b oracle=False -b recurrent=True --seed 0 -b env=\"3d\"

Note that these commands configure whether the reward is sparse or not with sparse_reward and the number of objectis to pick up with visit_length.

Sequence Segmentation Didactic Example

To run the didactic examples in the paper, run the following commands:

Simple Colors

python train_mdl.py \
    --coding_len_coeff=0.1 \
    --use_abs_pos_kl=1.0 \
    --batch-size=512 \
    --seed=0 \
    --dataset-path=./data/toy_data.npy \
    --max-iters=30000

Conditional Colors

python train_mdl.py \
    --coding_len_coeff=0.1 \
    --use_abs_pos_kl=1.0 \
    --batch-size=512 \
    --seed=0 \
    --dataset-path=./data/toy_data_markov_3_option.npy \
    --max-iters=30000

Citation

If you use this code, please cite our paper.

@article{jiang2022learning,
  title={Learning Options via Compression},
  author={Jiang, Yiding and Liu, Evan Zheran and Eysenbach, Benjamin and Kolter, Zico and Finn, Chelsea},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
asset/fonts		asset/fonts
configs		configs
data		data
dqn		dqn
grid_world		grid_world
world3d		world3d
LICENSE		LICENSE
README.md		README.md
checkpoints.tar.gz		checkpoints.tar.gz
compile.tar.gz		compile.tar.gz
env.py		env.py
hssm_rl.py		hssm_rl.py
hssm_v2.py		hssm_v2.py
modules.py		modules.py
option_wrapper.py		option_wrapper.py
requirements.txt		requirements.txt
train_mdl.py		train_mdl.py
train_rl.py		train_rl.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Options via Compression (Love)

Setup

Hierarchical Reinforcement Learning Experiments

Extracting learned skills

Learning new tasks with the learned skills

Sequence Segmentation Didactic Example

Simple Colors

Conditional Colors

Citation

About

Releases

Packages

Contributors 2

Languages

License

yidingjiang/love

Folders and files

Latest commit

History

Repository files navigation

Learning Options via Compression (Love)

Setup

Hierarchical Reinforcement Learning Experiments

Extracting learned skills

Learning new tasks with the learned skills

Sequence Segmentation Didactic Example

Simple Colors

Conditional Colors

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages