This is a self-contained repository to explain two basic Reinforcement (RL) algorithms, namely Policy Gradient (PG) and Q-learning, and show how to apply them on control problems. Dynamical systems might have discrete action-space like cartpole where two possible actions are +1 and -1 or continuous action space like linear Gaussian systems. Usually, you can find a code for only one of these cases. It might be not obvious how to extend one to another.
In this repository, we will explain how to formulate PG and Q-learning for each of these cases. We will provide implementations for these algorithms for both cases as Jupyter notebooks. You can also find the pure code for these algorithms (and also a few more algorithms that I have implemented but not discussed). The code is easy to follow and read. We have written in a modular way, so for example, if one is interested in the implementation of an algorithm is not confused with defining an environment in gym or plotting the results or so on. The theoretical materials in this repo is summarized in a handout which is available in ArXiv. Click here to access the handoutThe handout can be downloaded from here
Here is a BibTeX entry that you can use to cite the handout in a publication:
@misc{yaghmaie2021crash,
title={A Crash Course on Reinforcement Learning},
author={Farnaz Adib Yaghmaie and Lennart Ljung},
year={2021},
eprint={2103.04910},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
If you use this repo, please consider citing the following relevant papers:
-
F. Adib Yaghmaie, S. Gunnarsson and F. L. Lewis "Output Regulation of Unknown Linear Systems using Average Cost Reinforcement Learning", Automatica, Vol. 110, 2019.
-
F. Adib Yaghmaie and F. Gustafsson "Using Reinforcement Learning for Model-free Linear Quadratic Control with Process and Measurement Noises", In 2019 Decision and Control (CDC)4, IEEE 58th Conference on, 2019, pp. 6510-6517.
-
F. Adib Yaghmaie and s. Gunnarsson "A New Result on Robust Adaptive Dynamic Programming for Uncertain Partially Linear Systems", In 2019 Decision and Control (CDC)4, IEEE 58th Conference on, 2019, pp. 7480-7485.
This repository contains presentation files and codes.
The presentation files are related to the LINK-SIC workshop on Reinforcment Learning. The first day will be Friday March 12, 2021, 13.15 - 16.30, and the second day will be Tuesday April 6, 2021, 13.15 - 16.30. You can find the presentation files in pdf in the folder presentation.
The code is given as Jupyter notebooks and python files. If you want to run Jupyter notebooks, I suggest to use google colab. If you want to extend the results and examine more systems, I suggest to clone this repostory and run on your computer.
- Go to [https://colab.research.google.com/notebooks/intro.ipynb] and sign in with a Google acount.
- Click File, and Upload notebook. If you get the webpage in Swedish, click Arkiv and then Ladda upp anteckningsbok.
- Select github and paste the following link [https://github.com/FarnazAdib/Crash_course_on_RL.git].
- Then, a list of files with type .ipynb appears. They are Jupyter notebooks. Jupyter notebooks can have both text and code and it is possible to run the code. As an example, scroll down and open pg_on_cartpole_notebook.ipynb.
- The file contains some cells with text and come cells with code. The cells which contain code have
$[]$ on the left. If you move your mouse over$[ ]$ , a play box appears. You can click on it to run the cell. Make sure not to miss a cell as it causes fatal errors. - You can continue like this and run all code cells one by one up to the end.
- Go to [https://github.com/FarnazAdib/Crash_course_on_RL.git] and clone the project.
- Open PyCharm. From PyCharm. Click File and open project. Then, navigate to the project folder.
- Follow Preparation notebook to build a virtual environment and import required libraries.
The theoretical materials in this repo is nicely summarized in our handout in pdf format available at https://arxiv.org/abs/2103.04910. If you wish to read the materials in this repo, you can start by reading about Reinforcement Learning
You can read about dynamics systems (or environments in RL terminology) that we consider in this repo here.
- Cartpole: an environment with discrete action-space
- Linear Gaussian: an environment with continuous action space
Policy Gradient is one of the popular RL routines that relies upon optimizing the policy directly. Below, you can see jupyter notebooks regarding Policy Gradient (PG) algorithm
You can also see the pure code for PG
- PG pure code
Q-learning is another popular RL routine that relies upon dynamic programming. Below, you can see jupyter notebooks regarding Q-learning algorithm
- Explanation of Q-learning
- Explanation of experience replay Q-learning
- How to code experience replay Q learning for systems with discrete action space (cartple)
- We have not implemented explerience replay Q learning on LQ problem because the plain Q-learning is super good on LQ porblem. Note that as you can see from the explanation in the experience replay Q-learning, this algorithm has only two simple functions in addition to the plain Q-learning and those are not related to the action to be discrete or continuous. So, the extension is quite straight forward.
You can also see the pure code for Q- and experience replay Q-learning
-
Q-learning pure code
-
Experience replay Q-learning pure code
The presentation files for the LINK-SIC workshop can be downloaded from the folder called presentation. There, you can find the presentation files for day1 and day2.