You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for open sourcing this great work. While trying the code, I found the training speed is ~3x slower than Swin Transformer. For example, for quadtree-b2 which has similar FLOPs as Swin-T, training takes ~2.5s per batch. And it is even slower (3s/batch) when I align its macro design (depths, embedding dims, etc.) with Swin-T.
Can you give some insights to account for this scenario?
The text was updated successfully, but these errors were encountered:
Not exactly. There are 2 reasons: 1) we implement the quadtree attention with raw cuda without much optimization. We expect a speedup if implemented with torch.geometry. 2) The sparsity nature of quadtree attention make it unfriendly to hardware. This cannot be solved from code level.
I suggest that the easiest solution is to reduce top K, so you can achieve significant speedup without much performance loss.
Thanks for open sourcing this great work. While trying the code, I found the training speed is ~3x slower than Swin Transformer. For example, for quadtree-b2 which has similar FLOPs as Swin-T, training takes ~2.5s per batch. And it is even slower (3s/batch) when I align its macro design (depths, embedding dims, etc.) with Swin-T.
Can you give some insights to account for this scenario?
The text was updated successfully, but these errors were encountered: