Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential mistake in SegFormer model: patch_size argument in SegFormer model not being used. #141

Open
jonasdieker opened this issue Dec 10, 2023 · 1 comment

Comments

@jonasdieker
Copy link

jonasdieker commented Dec 10, 2023

Hi there,

first of all thank you for your work and providing all the code! I was looking at the following lines in the SegFormer backbone model:

class MixVisionTransformer(nn.Module):
def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dims=[64, 128, 256, 512],
num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], qkv_bias=False, qk_scale=None, drop_rate=0.,
attn_drop_rate=0., drop_path_rate=0., norm_layer=nn.LayerNorm,
depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1]):
super().__init__()
self.num_classes = num_classes
self.depths = depths
# patch_embed
self.patch_embed1 = OverlapPatchEmbed(img_size=img_size, patch_size=7, stride=4, in_chans=in_chans,
embed_dim=embed_dims[0])
self.patch_embed2 = OverlapPatchEmbed(img_size=img_size // 4, patch_size=3, stride=2, in_chans=embed_dims[0],
embed_dim=embed_dims[1])
self.patch_embed3 = OverlapPatchEmbed(img_size=img_size // 8, patch_size=3, stride=2, in_chans=embed_dims[1],
embed_dim=embed_dims[2])
self.patch_embed4 = OverlapPatchEmbed(img_size=img_size // 16, patch_size=3, stride=2, in_chans=embed_dims[2],
embed_dim=embed_dims[3])

I noticed that the argument patch_size is not actually being used for the OverlapPatchEmbed modules.

Instead you hard coded a patch sizes of [7, 3, 3, 3] for the 4 blocks. While this of course is still smaller than the 16x16 patches in ViT, and thus still lends itself better to detection and segmentation tasks, the model deviates from the paper, where you describe an initial patch size of 4 being used. This also means that classes inheriting from this class do not use the argument at all!

Maybe I am misunderstanding something, so I would be happy if you could shed some light on this potential mistake! Thank you.

@hubert10
Copy link

hubert10 commented May 9, 2024

Hi Jonas, I've also observed something similar and this needs to be clarified either in the paper or in the code above!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants