Skip to content

Unification

Pre-release
Pre-release
Compare
Choose a tag to compare
@cpnota cpnota released this 02 Aug 20:51
· 174 commits to master since this release
e4fbe6e

This release contains several usability enhancements! The biggest change, however, is a refactor. The policy classes now extend from Approximation. This means that things like target networks, learning rate schedulers, and model saving is all handled in one place!

This full list of changes is:

  • Refactored experiment API (#88)
  • Policies inherit from Approximation (#89)
  • Models now save themselves automatically every 200 updates. Also, you can load models and watch them play in each environment! (#90)
  • Automatically set the temperature in SAC (#91)
  • Schedule learning rates and other parameters (#92)
  • SAC bugfix
  • Refactor usage of target networks. Now there is a difference between eval() and target(): the former runs a forward pass of the current network, the latter does so on the target network, each without creating a computation graph. (#94)
  • Tweak AdvantageBuffer API. Also fix a minor bug in A2C (#95)
  • Report the best returns so far in separate metric (#96)