You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
They explain that in the article... the idea is to supress the survival bonus from the reward function in order to avoid some local optima. In hopper the survival bonus is 1 per step so shift is set to 1 and in humanoid it is 5 per sted so shift is set to 5. It is even commented in the ars.py file:
# for Swimmer-v1 and HalfCheetah-v1 use shift = 0
# for Hopper-v1, Walker2d-v1, and Ant-v1 use shift = 1
# for Humanoid-v1 used shift = 5
I have no idea about why we need to subtract a shift from reward, and how to set this value?
The text was updated successfully, but these errors were encountered: