Proximal Policy Optimization - with Keras

This is an implementation of the PPO algorithm. The agent to move in the environment uses both angular and linear speed. This feature is also used in the Multi-agent version.

Summary

The project implement the clipped version of Proximal Policy Optimization Algorithms described here https://arxiv.org/pdf/1707.06347.pdf

Into the config.yaml file are defined some of the hyper parameter used into the various implementation. Those parameters are initilized with the values proposed in the article.

Here the key point:

Loss function parameters:
- epsilon = 0.2
- gamma = 0.99
- entropy loss = 1e-3
Network size:
- state_size = 27 (25 laser scan + target heading + target distance)
- action_size (angular velocity) = 5
- action_size2 (linear velocity) = 3
- batch_size = 64
- output layer = 8 into 2 streams (5 nodes for angular and 3 for linear velocity)
- lossWeights for the output layer:
  - 0,5
  - 0.5

The values for the loss weights are the result of some test. With an equal weight the success rate is lower.

Prerequisites

Python 3
Tensorflow
NumPy, matplotlib, scipy
Keras
Unity

# create conda environment named "tensorflow"
conda create -n tensorflow pip python=3.6

# activate conda environment
activate tensorflow

# Tensorflow
pip install tensorflow

# Keras
pip install keras

Training

For start the training run the main.py file into anaconda environment:

activate tensorflow
python main.c

Future work

The project currently under development involves the use of a neural network shared between the various agents. In addition, each agent has a critical neural network, similar to that used in previous implementations.

The results found with the various tests were not positive. The agents after several episodes assumed wrong behavior. With the proposed solution it was possible to achieve a good level of success, but not consistently.

Arriving at a certain number of episodes, the agents begin to adopt repetitive behaviors. This leads to two possible results:

one of the agents manages to constantly reach the goal while the others do not move;
agents remain stationary on the spot

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
PPO_CentralizedCritic		PPO_CentralizedCritic
PPO_CentralizedNN		PPO_CentralizedNN
PPO_DoubleAction		PPO_DoubleAction
PPO_MultiAgent		PPO_MultiAgent
picture		picture
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proximal Policy Optimization - with Keras

Summary

Prerequisites

Training

Future work

About

Releases

Packages

Languages

MatteoBrentegani/PPO

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization - with Keras

Summary

Prerequisites

Training

Future work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages