Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization usage? #4

Open
JHLew opened this issue Dec 17, 2021 · 2 comments
Open

Normalization usage? #4

JHLew opened this issue Dec 17, 2021 · 2 comments

Comments

@JHLew
Copy link

JHLew commented Dec 17, 2021

Hi, thank you for the awesome work.
I have a question on using class normalization.
According to the 'class-norm-for-czsl.ipynb' file in this repo,
ClassNorm(CN) seems to be applied in the following form:

FC - CN - ReLU - CN - FC - ReLU.

But to my intuition, this seems a little weird, since layers are stacked usually in the form of:

FC - Normalization - ReLU - FC - Normalization - ReLU.

The current form seems to have an activation layer between two Class-Norm layers, without any kind of Conv / FC layers.
Is this intended?
I have went through the paper, but could not find the answer, possibly due to my problem in understanding.
Could you kindly clarify on this?

@universome
Copy link
Owner

Hi, thank you!

That's a very good question, and to be honest, I do not remember exactly what was the justification for us to make it this way. According to the theoretical exposition from the paper, we only need this standardization like this:

FC->Relu->Norm->FC->Relu

I suppose that at some point we decided to put it additionally in some other layers to be closer in spirit to batch normalization. And the only reasonable place we found for it to add was

FC->Norm->Relu->Norm->FC->Relu

Answering your question, it feels more like a coincidence that it is positioned this way (I agree that it looks strange). Interestingly, I just tried repositioning these normalization layers "normally" in a couple of ways and found that the performance becomes very bad. I think that it is due to the hyperparameters, but I will need time to dig deeper.

@JHLew
Copy link
Author

JHLew commented Dec 30, 2021

Thank you for the explanation.

Some additional questions:

  1. Then are the results reported in the paper experimented in the form of
    FC->Norm->Relu->Norm->FC->Relu ?

  2. I get that Normalization needs to come before the FC layers theoretically (FC->Relu->Norm->FC->Relu).
    But when I first read the paper, I expected it would be a form of (Norm->FC->Relu->FC->Relu).
    In short, I expected ClassNorm to be in the very first part of the non-linear head, applied to the logits before going through non-linear layers, but seems like ClassNorm is applied in the middle of the non-linear layers.

I wanted to double check if I understood it wrong, and it is theorectically correct to be in the middle of the layers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants