This repository contains the implementation of the paper "Maximum Entropy Deep Inverse Reinforcement Learning" by Wulfmeier et al. [1] in PyTorch.
It also contains the implementation of a deterministic value iteration algorithm by Sutton et Barto [2] along with a gymnasium
environment for the Windy Gridworld problem.
You will also find in the notebooks
directory a notebook that explains the Maximum Entropy Deep Inverse Reinforcement Learning algorithm and its implementation in PyTorch on the toy MDP environment.
This toy example is a simple environment with two states and two actions. The agent starts in state 0 and can either change state or stay in the same state. The reward is 3 if the agent stays at state 0 and -1 if it changes state. The environment is stochastic with a probability of 1 for changing state and 0.5 for staying in the same state, forcing the agent to move to the other state.
Adapted from Bert Huang's video on MDP [3]
This environment is a deterministic gridworld with a deterministic wind that pushes the agent by the number of cells indicated in the grid. The agent starts in the top-left corner and has to reach the bottom-right corner. The reward is -1 for each step and 0 for reaching the goal state where the episode ends.
Adapted from Sutton and Barto's book on Reinforcement Learning [1]
In the repository directory, run the following command to install the package.
pip install .
To contribute to the project, you can clone the repository and install the required dependencies in a virtual environment. You can do so by running the following commands in the repository directory.
python -m venv .venv
.\.venv\Scripts\Activate.ps1 # Powershell
.\.venv\Scripts\activate.bat # Windows cmd
source .venv/bin/activate # Ubuntu
python -m pip install --upgrade pip
pip install -e . # Install the package in development mode
pip install -r ./requirements.txt
Note: You might need the following for Powershell
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned
Note 2: To use GPUs with PyTorch, you should download the required package according to your needs from and make sure to replace the version installed with the requirements.txt.
The following command will train a Maximum Entropy Deep Inverse Reinforcement Learning model on the Windy Gridworld environment and plot the results.
python examples/
The following command will train a Maximum Entropy Deep Inverse Reinforcement Learning model on the toy MDP environment and plot the results.
python examples/
The following command will train a Value Iteration model on the Windy Gridworld environment and plot the results.
python examples/
[1] M. Wulfmeier, P. Ondruska, et I. Posner, «Maximum Entropy Deep Inverse Reinforcement Learning». arXiv, 11 mars 2016. doi: 10.48550/arXiv.1507.04888.
[2] R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction, Second edition. in Adaptive computation and machine learning series. Cambridge, Massachusetts: The MIT Press, 2018.
[3] Markov Decision Processes, (12 February 2015). Consulted on: 22 mai 2024. [Online video]. Available on: