You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
whitening operation, which is implemented by a non-parametric layer normalization operator without scaling and bias
You mentioned that the whitening operation is non-parametric. But it seems you implemented it by norm operation from the original paper which is not non-parametric.
I found that the author's teacher model output has gone through a norm(self.feature_model.norm(x_tgt)), and then there is self.ln_tgt(x_tgt). The output equivalent to the teacher model has gone through Layer Norm twice. I don't quite understand this.
You mentioned that the whitening operation is non-parametric. But it seems you implemented it by norm operation from the original paper which is not non-parametric.
The text was updated successfully, but these errors were encountered: