Name	Name	Last commit message	Last commit date
parent directory ..
imgs	imgs
README.md	README.md
bcq-train.py	bcq-train.py

Name

Last commit message

Last commit date

BCQ

1. Introduction

Batch-Constrained deep Q-learning (BCQ) [1] is a batch reinforcement learning method for continuous control. BCQ aims to perform Q-learning while constraining the action space to eliminate actions which are unlikely to be selected by the behavioral policy , and are therefore unlikely to be contained in the batch. At its core, BCQ uses a state-conditioned generative model to model the distribution of data in the batch, akin to a behavioral cloning model. As it is easier to sample from than model exactly in a continuous action space, the policy is defined by sampling actions from and selecting the highest valued action according to a Q-network. Since BCQ was designed for continuous actions, the method also includes a perturbation model , which is a residual added to the sampled actions in the range , and trained with the deterministic policy gradient. Finally the authors include a weighted version of Clipped Double Q-learning to penalize high variance estimates and reduce overestimation bias, using with :

$https://latex.codecogs.com/svg.image?\mathcal{L}(\theta)=\sum_{k}\left(r+\gamma \max _{\hat{a}}\left(\lambda \min _{k^{\prime}} Q_{\theta^{\prime}}^{k^{\prime}}\left(s^{\prime}, \hat{a}\right)+(1-\lambda) \max _{k^{\prime}} Q_{\theta^{\prime}}^{k^{\prime}}\left(s^{\prime}, \hat{a}\right)\right)-Q_{\theta}^{k}(s, a)\right)^{2}$

where During evaluation, the policy is defined similarly, by sampling $N$ actions from the generative model, perturbing them and selecting the argmax:

$https://latex.codecogs.com/svg.image?\pi(s)=\underset{\hat{a}=a_{i}+\xi_{\phi}\left(s^{\prime}, a_{i}\right)}{\operatorname{argmax}} Q_{\theta}^{0}(s, \hat{a}), \quad a_{i} \sim G_{\omega}(s) .$

2. Instruction

python bcq-train.py --dataset=walker2d-random-v2 --seed=0 --gpu=0

3. Performance

Reference

Fujimoto S, Meger D, Precup D. Off-policy deep reinforcement learning without exploration[C]//International Conference on Machine Learning. PMLR, 2019: 2052-2062.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BCQ

BCQ

README.md

BCQ

1. Introduction

2. Instruction

3. Performance

Reference

Files

BCQ

Directory actions

More options

Directory actions

More options

Latest commit

History

BCQ

Folders and files

parent directory

README.md

BCQ

1. Introduction

2. Instruction

3. Performance

Reference