-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalization usage? #4
Comments
Hi, thank you! That's a very good question, and to be honest, I do not remember exactly what was the justification for us to make it this way. According to the theoretical exposition from the paper, we only need this standardization like this: FC->Relu->Norm->FC->Relu I suppose that at some point we decided to put it additionally in some other layers to be closer in spirit to batch normalization. And the only reasonable place we found for it to add was FC->Norm->Relu->Norm->FC->Relu Answering your question, it feels more like a coincidence that it is positioned this way (I agree that it looks strange). Interestingly, I just tried repositioning these normalization layers "normally" in a couple of ways and found that the performance becomes very bad. I think that it is due to the hyperparameters, but I will need time to dig deeper. |
Thank you for the explanation. Some additional questions:
I wanted to double check if I understood it wrong, and it is theorectically correct to be in the middle of the layers. |
Hi, thank you for the awesome work.
I have a question on using class normalization.
According to the 'class-norm-for-czsl.ipynb' file in this repo,
ClassNorm(CN) seems to be applied in the following form:
FC - CN - ReLU - CN - FC - ReLU.
But to my intuition, this seems a little weird, since layers are stacked usually in the form of:
FC - Normalization - ReLU - FC - Normalization - ReLU.
The current form seems to have an activation layer between two Class-Norm layers, without any kind of Conv / FC layers.
Is this intended?
I have went through the paper, but could not find the answer, possibly due to my problem in understanding.
Could you kindly clarify on this?
The text was updated successfully, but these errors were encountered: