Skip to content

afriyadi-it/saferl-cddqn

Repository files navigation

Towards Human-Level Safe Reinforcement Learning in Atari Library Environment

Using Gym Super Mario Bros as the environment Using Stable Baselines, a fork of OpenAI's popular Baselines reinforcement learning library

using the concept of modified reward, as the simplest safety constraint to enforce safety behaviour of the agent.

falling into pit is set as unsafe / catastrophic state.

below are the results of experimenting in multiple iterations:

Safe DDQN

Notes GIFs
mostly still fails
  • reward 629.2
  • violation 20
  • completion rate 0%
Iteration: 100k
alt_text
Mario shows hesitation
  • reward 1001.7
  • violation 19
  • completion rate 0%
Iteration: 500k
alt_text
proceed smoothly but violation in the end
  • reward 921.5
  • violation 24
  • completion rate 0%
Iteration: 1m
alt_text
agent freeze due to fear of pit
  • reward 2331
  • violation 18
  • completion rate 18%
Iteration: 5m
alt_text
mostly win without problem
  • reward 2703.9
  • violation 2
  • completion rate 71%
Iteration: 10m
alt_text

DDQN

Notes GIFs
can avoid enemies but violate safety
  • reward 679.6
  • violation 24
  • completion rate 0%
Iteration: 100k
alt_text
still violate safety with wins sometimes
  • reward 1151.4
  • violation 23
  • completion rate 0%
Iteration: 500k
alt_text
farthest record of the model
  • reward 700.2
  • violation 27
  • completion rate 0%
Iteration: 1m
alt_text
complete the level but stuck for a while
  • reward 2755.5
  • violation 24
  • completion rate 47%
Iteration: 5m
alt_text
mostly completed level without problem
  • reward 2637.5
  • violation 18
  • completion rate 62%
Iteration: 10m
alt_text

PPO

Notes GIFs
good start then mostly stuck with pipe
  • reward 295.5
  • violation 0
  • completion rate 0%
Iteration: 100k
alt_text
mostly fails
  • reward 719.7
  • violation 23
  • completion rate 0%
Iteration: 500k
alt_text
still fails quickly but a bit change jump pattern
  • reward 165.5
  • violation 0
  • completion rate 0%
Iteration: 1m
alt_text
somewhat smooth but still fails
  • reward 815.3
  • violation 30
  • completion rate 0%
Iteration: 5m
alt_text
have progress but not much
  • reward 858.7
  • violation 32
  • completion rate 0%
Iteration: 10m
alt_text

Setup

must use Python version < 3.8, preferrably Python-3.7.6 this research is using vscode with virtual environment

pip install -r requirements.txt

Training

Training process is started with

python train.py

Evaluation

Evaluation process is started with

python eval.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages