- Add your policy to
. - Add you environment to
. - Run the generation module:
generate_dataset.py [-h] [-t POLICY_TYPE] [-p POLICY_PATH] [-e ENV] [-n NUM_EPISODES] [-o OUTPUT_PATH] [--render] [--seed SEED]
python -m generate_dateset.py -t random -e hopper -n 1000 -o cache/hopper.pkl --render --seed 0
gsutil -m cp -R gs://atari-replay-datasets/dqn/Breakout/ ./cache/
python -m experiment.atari --seed 1234 --context-length 30 --epochs 5 --model-type reward_conditioned --num-steps 500000 --num-buffers 50 --game Breakout --batch-size 128
Generate PandaReach dataset. The demonstrator needs time-feature wrapper.
python generate_dataset.py -t reach -e panda-reach-dense -n 100000 -o datasets/panda_reach_dense_random.pkl -w time-feature
python generate_dataset.py -t reach -p demonstrators/tqcher_panda_reach_dense_tf.zip -e panda-reach-dense -n 100000 -o datasets/panda_reach_dense_expert.pkl -w time-feature
Generate PandaPush dataset. The demonstrator needs time-feature wrapper.
python generate_dataset.py -t tqc+her -p demonstrators/sb3_tqc_panda_push_sparse.zip -e panda-push-sparse -n 100000 -o datasets/panda_push_sparse_100k_expert.pkl -w time-feature
Train Decision-Transformer on PandaReach.
python train_single.py -c configs/ICRA_1mln_exp_ratio_1_seed_1234/dt_panda_push_dense_tf.yaml --dataset datasets/split/panda_push_dense_1m_expert.pkl
- Train TQC on all envs.
- Generate datasets of sizes 1m 500k 250k 100k 50k 10k.
- Train DT on all envs and all dataset sizes.
Repeat on a different seed?
python -m rl_zoo3.train --env PandaPushDense-v3 --algo tqc --conf-file configs/tqcher_zoo.yaml --folder trained --save-freq 100000 --hyperparams n_envs:4 gradient_steps:-1
tells how many environments should work in parallel.