Unification
Pre-release
Pre-release
This release contains several usability enhancements! The biggest change, however, is a refactor. The policy classes now extend from Approximation
. This means that things like target networks, learning rate schedulers, and model saving is all handled in one place!
This full list of changes is:
- Refactored experiment API (#88)
- Policies inherit from
Approximation
(#89) - Models now save themselves automatically every 200 updates. Also, you can load models and watch them play in each environment! (#90)
- Automatically set the temperature in SAC (#91)
- Schedule learning rates and other parameters (#92)
- SAC bugfix
- Refactor usage of target networks. Now there is a difference between
eval()
andtarget()
: the former runs a forward pass of the current network, the latter does so on the target network, each without creating a computation graph. (#94) - Tweak
AdvantageBuffer
API. Also fix a minor bug in A2C (#95) - Report the best returns so far in separate metric (#96)