Each experiment uses 3 seeds and is trained for 10k environment steps. The parameters used for QR-DQN are the same parameters as described in the original paper.
coach -p Atari_QR_DQN -lvl breakout
coach -p Atari_QR_DQN -lvl pong
Each experiment uses 3 seeds and is trained for 10k environment steps. The parameters used for QR-DQN are the same parameters as described in the original paper.
coach -p Atari_QR_DQN -lvl breakout
coach -p Atari_QR_DQN -lvl pong