Skip to content

Commit

Permalink
Add source files for experiments in NeuRIPS submission
Browse files Browse the repository at this point in the history
  • Loading branch information
YuqianJiang committed Jun 11, 2020
0 parents commit 8e722b8
Show file tree
Hide file tree
Showing 27 changed files with 227,291 additions and 0 deletions.
61 changes: 61 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Experiments for Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks

This repository contains the source code of the experiments for the paper Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks.

## Requirements

To automatically install all dependencies:

```setup
pip install -r requirements.txt
```

Our deep R-learning algorithm is implemented on top of [OpenAI Baselines](https://github.com/openai/baselines). Follow their instructions in case manual installation is required.

## Cart Pole

### Environment

The environment `cartpole_continuing.py` is modified from the standard cart pole environment in [OpenAI Gym](https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py). The continuing cart pole environment removes the episode termination conditions and allows the cart and the pole to be at any positions.

### Training

To train the cart pole agent with our temporal-logic-based shaping method, run this command:

```train
python continuing_cartpole/train_cartpole.py
```

## Continual Area Sweeping

### Environment

The environment `env.py` implements a robot sweeping repeatedly and non-uniformly to maximize average reward in a grid world. Rewards/events appear in grid cells with different probabilities, and the robot receives a reward by going to each cell with an active event (such as trash to be picked up). There are two different scenarios studied in the paper: 1) events only appear in a certain region, 2) a human moves around and may generate event with every step.

### Training

To run the experiment in the first scenario, run this command:

```train
python continual_area_sweeping/shield_experiment_region.py
```

To run the experiment in the second scenario, run this command:

```train
python continual_area_sweeping/shield_experiment_person.py
```

## Grid World

### Environment

`gridworld.py` implements a continuing grid world environment where the agent receives a reward and gets "transported" to a random cell when it reaches some "goal" cell. In this experiment, the "goal" cell is on the bottom right.

### Training

Shaping and shielding methods with standard R-learning are implemented. To train the agent, run this command:

```train
python gridworld/continuing_experiment.py --csv results.csv
```
Loading

0 comments on commit 8e722b8

Please sign in to comment.