Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behaviors of Atari envs have changed by atari-py>=0.2 #1777

Closed
muupan opened this issue Jan 5, 2020 · 8 comments
Closed

Behaviors of Atari envs have changed by atari-py>=0.2 #1777

muupan opened this issue Jan 5, 2020 · 8 comments

Comments

@muupan
Copy link

muupan commented Jan 5, 2020

The behaviors of Atari envs seem affected by the version of atari-py even though the env id the same.

import gym

env = gym.make('MsPacmanNoFrameskip-v4')
env.seed(0)
env.reset()
done = False
t = 0
R = 0
while not done:
    _, r, done, _ = env.step(1)
    t += 1
    R += r
print(t, R)

Below is the output of this code for each pair of gym and atari-py. It seem like atari-py is the cause of the difference, but since gym requires atari-py~=0.2.0 in setup.py from 1.3.0 (#1535) it should be responsible for the version of atari-py. That is why I opened this issue here, not in https://github.com/openai/atari-py.

  • gym==0.15.4 atari-py==0.2.6: 2009 90.0
  • gym==0.15.4 atari-py==0.2.0: 2009 90.0
  • gym==0.15.4 atari-py==0.1.15: 1329 90.0
  • gym==0.15.4 atari-py==0.1.4: 1329 90.0
  • gym==0.12.6 atari-py==0.2.6: 2009 90.0
  • gym==0.12.6 atari-py==0.2.0: 2009 90.0
  • gym==0.12.6 atari-py==0.1.15: 1329 90.0
  • gym==0.12.6 atari-py==0.1.4: 1329 90.0

I also confirmed such a difference for ChopperCommandNoFrameskip-v4.

I am concerned that these differences might significantly affect the evaluation of RL algorithms. Has anyone investigated the effect?

@muupan
Copy link
Author

muupan commented Jan 6, 2020

Below are the results obtained by my PPO implementation on Atari using 3 different seeds for each configuration. gym==0.12.1 was used. It seems that it actually affect the performance for some games. I'm not sure what change in atari-py or ALE caused this though.

image
image
image

@kngwyu
Copy link
Contributor

kngwyu commented Jan 6, 2020

openai/atari-py#49 looks the biggest change

@christopherhesse
Copy link
Contributor

The changes were meant specifically to not change the behavior of the environments: openai/atari-py#49 (comment)

@JesseFarebro any idea what might be going on here?

@christopherhesse
Copy link
Contributor

Thanks for investigating this @muupan

@JesseFarebro
Copy link
Contributor

Hi @muupan @christopherhesse,

Thanks for bringing this to my attention. I'll investigate further and let you know what my findings are.

@JesseFarebro
Copy link
Contributor

JesseFarebro commented Jan 11, 2020

Thanks for the reproduction @muupan.

With regards to Ms. Pacman: Essentially what is happening is we press the reset button too many times and this leads to a different starting state in ALE v0.6.0. The following commit introduced this bug: Farama-Foundation/Arcade-Learning-Environment@7bff96b#diff-d9d868097a7403416e6ef352d95dc4feR85. This should have minimal effects on performance comparisons between v0.5.2 and v0.6.*.

Here are two images comparing the first frame in each ALE version:

Ms. Pacman v0.5.2

frame-v0 5 2

Ms. Pacman v0.6.0

frame-v0 6 0

With regards to Chopper Command: The issue that affects Ms. Pacman also affects Chopper Command. The call to softReset happens in two places. The one linked above and also Farama-Foundation/Arcade-Learning-Environment@31d8e17#diff-0ff5bae3de90143156577bc8324e6d27R155.

As for the performance difference in your PPO agent that doesn't actually strike me as overly surprising. If we assume that these two runs weren't completely deterministic (due to a different episode start state as discussed above, or an improper seed) these curves seem within reason for 3 seeds. I looked at the original PPO paper and their results on Chopper Command show large variance between their 3 runs.

I have opened an issue upstream (Farama-Foundation/Arcade-Learning-Environment#291).

Hopefully, this helps clear some things up.

@christopherhesse
Copy link
Contributor

Thanks for investigating @JesseFarebro!

@JesseFarebro
Copy link
Contributor

This is being tracked upstream Farama-Foundation/Arcade-Learning-Environment#291. Feel free to close this.

@jkterry1 jkterry1 closed this as completed Aug 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants