Code for reproducing the results of Offline RL at Multiple Frequencies (arXiv, website).
Offline data was collected from replay buffers during training with the DAU repository or this repository and can be downloaded here.
This repository builds off of Young Geng's implementation of CQL.
- Install and use the included Ananconda environment
$ conda env create -f environment.yml
$ source activate
- Add this repo directory to your
PYTHONPATH
environment variable.
export PYTHONPATH="$PYTHONPATH:$(pwd)"
We provide example run scripts for pendulum, door, and kitchen.
For example, to run the adaptive n-step algorithm:
./run_kitchen.sh 120 101 .99 500
To run the naive mixing baseline:
./run_kitchen.sh 0 101 .99 500
The max n-step baseline can be run by setting the all_same_N
flag to True
and the individual training baselines can be run by commenting out the data loaders.
By default, the scripts log to W&B. To log to W&B, set your W&B API key environment variable:
export WANDB_API_KEY='YOUR W&B API KEY HERE'