Using Gym Super Mario Bros as the environment Using Stable Baselines, a fork of OpenAI's popular Baselines reinforcement learning library
using the concept of modified reward, as the simplest safety constraint to enforce safety behaviour of the agent.
falling into pit is set as unsafe / catastrophic state.
below are the results of experimenting in multiple iterations:
must use Python version < 3.8, preferrably Python-3.7.6 this research is using vscode with virtual environment
pip install -r requirements.txt
Training process is started with
python train.py
Evaluation process is started with
python eval.py