You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basically, Fixup Initialization purportedly allows us to get rid of batch normalization layers, which leads to all sorts of advantages, described in the link.
I vaguely recall David Wu backtracking on the value of this idea when we spoke. So we should double-check with him on this.
The text was updated successfully, but these errors were encountered:
...the part about fixup init is a bit outdated and is going to get updated once I publish the new architectures in a few months - fixup actually does have some significant costs on final neural net fitting quality, that I hadn't known at the time, so sticking with batch norm is probably the best approach still if you want to just get something simple working.
Used in KataGo, described here.
Basically, Fixup Initialization purportedly allows us to get rid of batch normalization layers, which leads to all sorts of advantages, described in the link.
I vaguely recall David Wu backtracking on the value of this idea when we spoke. So we should double-check with him on this.
The text was updated successfully, but these errors were encountered: