Potential mistake in SegFormer model: `patch_size` argument in SegFormer model not being used. #141

jonasdieker · 2023-12-10T14:56:47Z

Hi there,

first of all thank you for your work and providing all the code! I was looking at the following lines in the SegFormer backbone model:

SegFormer/mmseg/models/backbones/mix_transformer.py

Lines 203 to 220 in 65fa8cf

    
           class MixVisionTransformer(nn.Module): 
        
               def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dims=[64, 128, 256, 512], 
        
                            num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], qkv_bias=False, qk_scale=None, drop_rate=0., 
        
                            attn_drop_rate=0., drop_path_rate=0., norm_layer=nn.LayerNorm, 
        
                            depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1]): 
        
                   super().__init__() 
        
                   self.num_classes = num_classes 
        
                   self.depths = depths 
        
                   # patch_embed 
        
                   self.patch_embed1 = OverlapPatchEmbed(img_size=img_size, patch_size=7, stride=4, in_chans=in_chans, 
        
                                                         embed_dim=embed_dims[0]) 
        
                   self.patch_embed2 = OverlapPatchEmbed(img_size=img_size // 4, patch_size=3, stride=2, in_chans=embed_dims[0], 
        
                                                         embed_dim=embed_dims[1]) 
        
                   self.patch_embed3 = OverlapPatchEmbed(img_size=img_size // 8, patch_size=3, stride=2, in_chans=embed_dims[1], 
        
                                                         embed_dim=embed_dims[2]) 
        
                   self.patch_embed4 = OverlapPatchEmbed(img_size=img_size // 16, patch_size=3, stride=2, in_chans=embed_dims[2], 
        
                                                         embed_dim=embed_dims[3])

I noticed that the argument patch_size is not actually being used for the OverlapPatchEmbed modules.

Instead you hard coded a patch sizes of [7, 3, 3, 3] for the 4 blocks. While this of course is still smaller than the 16x16 patches in ViT, and thus still lends itself better to detection and segmentation tasks, the model deviates from the paper, where you describe an initial patch size of 4 being used. This also means that classes inheriting from this class do not use the argument at all!

Maybe I am misunderstanding something, so I would be happy if you could shed some light on this potential mistake! Thank you.

The text was updated successfully, but these errors were encountered:

hubert10 · 2024-05-09T06:46:42Z

Hi Jonas, I've also observed something similar and this needs to be clarified either in the paper or in the code above!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential mistake in SegFormer model: `patch_size` argument in SegFormer model not being used. #141

Potential mistake in SegFormer model: `patch_size` argument in SegFormer model not being used. #141

jonasdieker commented Dec 10, 2023 •

edited

Loading

hubert10 commented May 9, 2024

Potential mistake in SegFormer model: patch_size argument in SegFormer model not being used. #141

Potential mistake in SegFormer model: patch_size argument in SegFormer model not being used. #141

Comments

jonasdieker commented Dec 10, 2023 • edited Loading

hubert10 commented May 9, 2024

Potential mistake in SegFormer model: `patch_size` argument in SegFormer model not being used. #141

Potential mistake in SegFormer model: `patch_size` argument in SegFormer model not being used. #141

jonasdieker commented Dec 10, 2023 •

edited

Loading