RL gym

Implementing different reinforcement learning algorithms on different gym environments.

These algorithms are implemented in this repo:

A2C
DDPG
Double DQN
Dueling DQN
TD3

And tested on these environments.


Cartpole	Pendulum	Acrobat	Lunar Lander Continuous

A2C

A2C is a on-policy, model-free reinforcement learning algorithm. Here is the pseudo code for A3C which is almost similar to A2C.

Agent trained using A2C playing Acrobat game.

DDPG

DDPG is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for DDPG

Agent trained using DDPG playing lunar lander game.

Double_DQN

Double DQN is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for Double DQN

Agent trained using Double DQN playing Cartpole game.

Dueling_DQN

Similar to DDQN, dueling network contains two separate estimators: one for the state value function and one for the state-dependent action advantage function.

Formula for the decomposition of Q-value:

θ is shared parameter for the network.
α parameterizes output stream for advantage function Α.
β parameterizes output stream for value function V.

Agent trained using Dueling DQN playing Acrobat game.

TD3

TD3 is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for TD3

Agent trained using TD3 playing Pendulum game.

References

Reinforcement Learning - Goal Oriented Intelligence
Reinforcement Learning by Sentdex
OpenAI's Spinning Up Docs
Wang et al., Dueling Network Architectures for Deep Reinforcement Learning
Dueling Deep Q Networks
Deriving Policy Gradients and Implementing REINFORCE
Understanding Actor Critic Methods and A2C
Keras DDPG Example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RL gym

A2C

DDPG

Double_DQN

Dueling_DQN

TD3

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

RL gym

A2C

DDPG

Double_DQN

Dueling_DQN

TD3

References