Policy Surprise Weighting #36

shindavid · 2023-01-30T16:54:00Z

Used by KataGo, described here.

Basically, weights the self-play-generated sample points based on how "surprising" they are. That is, if the MCTS-generated count-distribution looks very different from the policy-prior, then includes multiple copies of that row of data, so that the next generation neural network puts more weight on correcting it.

Implement this and validate value through experimentation.

shindavid added KataGo replication task learning improvement labels Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy Surprise Weighting #36

Policy Surprise Weighting #36

shindavid commented Jan 30, 2023

Policy Surprise Weighting #36

Policy Surprise Weighting #36

Comments

shindavid commented Jan 30, 2023