Fixup Initialization #33

shindavid · 2023-01-30T16:25:55Z

Used in KataGo, described here.

Basically, Fixup Initialization purportedly allows us to get rid of batch normalization layers, which leads to all sorts of advantages, described in the link.

I vaguely recall David Wu backtracking on the value of this idea when we spoke. So we should double-check with him on this.

shindavid · 2023-03-27T22:34:00Z

David Wu wrote this in an email:

...the part about fixup init is a bit outdated and is going to get updated once I publish the new architectures in a few months - fixup actually does have some significant costs on final neural net fitting quality, that I hadn't known at the time, so sticking with batch norm is probably the best approach still if you want to just get something simple working.

shindavid added KataGo replication task learning improvement modeling and removed learning improvement labels Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixup Initialization #33

Fixup Initialization #33

shindavid commented Jan 30, 2023

shindavid commented Mar 27, 2023

Fixup Initialization #33

Fixup Initialization #33

Comments

shindavid commented Jan 30, 2023

shindavid commented Mar 27, 2023