You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's take the first linear layer as an example: Matrix A in https://pytorch.org/docs/stable/generated/torch.nn.Linear.html has shape (512, 784).
Matrix x (since the example batched) has shape (64, 784).
Computing the result, C=xA^T requires 2*64*512*784 - 64*512 floating point operations.
However, in the example a bias is used, i.e., 64*512 additions on top -> 2*64*512*784=513,80,224 flops total; the tool reports 25,690,112 for the first layer. btw: I am not sure why the bias doesn't show up separately.
I believe that the code below is off since the number of ops of the op C+=AB using BLAS identifiers is 2*M*N*K not M*N*K:
Different groups adopt different conventions, unfortunately. We implemented the convention in computer vision, which is to use MACs and ignore the flops of bias.
#77 would improve this, but unfortunately no one is working on it.
The matmul flop counts seem to be off by 2x.
I tested the code on a simple MLP which reads as:
Embedded this in some code with the crucial piece here:
This returns:
Let's take the first linear layer as an example: Matrix A in https://pytorch.org/docs/stable/generated/torch.nn.Linear.html has shape (512, 784).
Matrix x (since the example batched) has shape (64, 784).
Computing the result, C=xA^T requires 2*64*512*784 - 64*512 floating point operations.
However, in the example a bias is used, i.e., 64*512 additions on top -> 2*64*512*784=513,80,224 flops total; the tool reports 25,690,112 for the first layer. btw: I am not sure why the bias doesn't show up separately.
I believe that the code below is off since the number of ops of the op C+=AB using BLAS identifiers is 2*M*N*K not M*N*K:
fvcore/fvcore/nn/jit_handles.py
Line 225 in e4f0b3d
The text was updated successfully, but these errors were encountered: