-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Behaviors of Atari envs have changed by atari-py>=0.2 #1777
Comments
openai/atari-py#49 looks the biggest change |
The changes were meant specifically to not change the behavior of the environments: openai/atari-py#49 (comment) @JesseFarebro any idea what might be going on here? |
Thanks for investigating this @muupan |
Thanks for bringing this to my attention. I'll investigate further and let you know what my findings are. |
Thanks for the reproduction @muupan. With regards to Ms. Pacman: Essentially what is happening is we press the reset button too many times and this leads to a different starting state in ALE Here are two images comparing the first frame in each ALE version: Ms. Pacman v0.5.2Ms. Pacman v0.6.0With regards to Chopper Command: The issue that affects Ms. Pacman also affects Chopper Command. The call to As for the performance difference in your PPO agent that doesn't actually strike me as overly surprising. If we assume that these two runs weren't completely deterministic (due to a different episode start state as discussed above, or an improper seed) these curves seem within reason for 3 seeds. I looked at the original PPO paper and their results on Chopper Command show large variance between their 3 runs. I have opened an issue upstream (Farama-Foundation/Arcade-Learning-Environment#291). Hopefully, this helps clear some things up. |
Thanks for investigating @JesseFarebro! |
This is being tracked upstream Farama-Foundation/Arcade-Learning-Environment#291. Feel free to close this. |
The behaviors of Atari envs seem affected by the version of
atari-py
even though the env id the same.Below is the output of this code for each pair of
gym
andatari-py
. It seem likeatari-py
is the cause of the difference, but sincegym
requiresatari-py~=0.2.0
insetup.py
from 1.3.0 (#1535) it should be responsible for the version ofatari-py
. That is why I opened this issue here, not in https://github.com/openai/atari-py.gym==0.15.4
atari-py==0.2.6
: 2009 90.0gym==0.15.4
atari-py==0.2.0
: 2009 90.0gym==0.15.4
atari-py==0.1.15
: 1329 90.0gym==0.15.4
atari-py==0.1.4
: 1329 90.0gym==0.12.6
atari-py==0.2.6
: 2009 90.0gym==0.12.6
atari-py==0.2.0
: 2009 90.0gym==0.12.6
atari-py==0.1.15
: 1329 90.0gym==0.12.6
atari-py==0.1.4
: 1329 90.0I also confirmed such a difference for
ChopperCommandNoFrameskip-v4
.I am concerned that these differences might significantly affect the evaluation of RL algorithms. Has anyone investigated the effect?
The text was updated successfully, but these errors were encountered: