Skip to content

Latest commit

 

History

History
118 lines (83 loc) · 4.39 KB

README.md

File metadata and controls

118 lines (83 loc) · 4.39 KB

RL gym

Implementing different reinforcement learning algorithms on different gym environments.

These algorithms are implemented in this repo:

And tested on these environments.

Cartpole
Pendulum
Acrobat
Lunar Lander Continuous

A2C

A2C is a on-policy, model-free reinforcement learning algorithm. Here is the pseudo code for A3C which is almost similar to A2C.

Agent trained using A2C playing Acrobat game.

DDPG

DDPG is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for DDPG


Agent trained using DDPG playing lunar lander game.

Double_DQN

Double DQN is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for Double DQN

Agent trained using Double DQN playing Cartpole game.

Dueling_DQN

Similar to DDQN, dueling network contains two separate estimators: one for the state value function and one for the state-dependent action advantage function.

Formula for the decomposition of Q-value:

  • θ is shared parameter for the network.
  • α parameterizes output stream for advantage function Α.
  • β parameterizes output stream for value function V.
Agent trained using Dueling DQN playing Acrobat game.

TD3

TD3 is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for TD3


Agent trained using TD3 playing Pendulum game.

References