Skip to content
/ cldt Public

Continual learning with decision transformer

Notifications You must be signed in to change notification settings

lubiluk/cldt

Repository files navigation

  1. Add your policy to policies.py.
  2. Add you environment to envs.py.
  3. Run the generation module:
Usage: 
generate_dataset.py [-h] [-t POLICY_TYPE] [-p POLICY_PATH] [-e ENV] [-n NUM_EPISODES] [-o OUTPUT_PATH] [--render] [--seed SEED]

Example:
python -m generate_dateset.py -t random -e hopper -n 1000 -o cache/hopper.pkl --render --seed 0

Downloading Atari datasets

gsutil -m cp -R gs://atari-replay-datasets/dqn/Breakout/ ./cache/

Running Atari DT

python -m experiment.atari --seed 1234 --context-length 30 --epochs 5 --model-type reward_conditioned --num-steps 500000 --num-buffers 50 --game Breakout --batch-size 128

Multi-Goal examples

Generate PandaReach dataset. The demonstrator needs time-feature wrapper.

python generate_dataset.py -t reach -e panda-reach-dense -n 100000 -o datasets/panda_reach_dense_random.pkl -w time-feature
python generate_dataset.py -t reach -p demonstrators/tqcher_panda_reach_dense_tf.zip -e panda-reach-dense -n 100000 -o datasets/panda_reach_dense_expert.pkl -w time-feature

Generate PandaPush dataset. The demonstrator needs time-feature wrapper.

python generate_dataset.py -t tqc+her -p demonstrators/sb3_tqc_panda_push_sparse.zip -e panda-push-sparse -n 100000 -o datasets/panda_push_sparse_100k_expert.pkl -w time-feature

Train Decision-Transformer on PandaReach.

python train_single.py -c configs/ICRA_1mln_exp_ratio_1_seed_1234/dt_panda_push_dense_tf.yaml  --dataset datasets/split/panda_push_dense_1m_expert.pkl 

Experiments TODO

  1. Train TQC on all envs.
  2. Generate datasets of sizes 1m 500k 250k 100k 50k 10k.
  3. Train DT on all envs and all dataset sizes.

Repeat on a different seed?


Train fast using RL Zoo

python -m rl_zoo3.train --env PandaPushDense-v3 --algo tqc --conf-file configs/tqcher_zoo.yaml --folder trained --save-freq 100000 --hyperparams n_envs:4 gradient_steps:-1

n_envs tells how many environments should work in parallel.

About

Continual learning with decision transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published