Mismatch Decoder when Training a Network #173

matteo-collina · 2024-12-26T18:42:01Z

Hi,

I am using the last version of TagLab and I am getting this error while training a network:

RuntimeError: Error(s) in loading state_dict for DeepLab: size mismatch for decoder.last_conv.8.weight: copying a param with shape torch.Size([41, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([40, 256, 1, 1]). size mismatch for decoder.last_conv.8.bias: copying a param with shape torch.Size([41]) from checkpoint, the shape in current model is torch.Size([40]).

I made an hotfix adding +1 at line 567 of training.py like this:

net = DeepLab(backbone='resnet', output_stride=16, num_classes=(output_classes+1))

but I am not sure this is a good method. The model gets trained but the result is not working (the graph produced at the end does not show any value and during the training, I get "nan" starting from epoch 3). I get the same error if I try to auto-segment, but the error is generated from MapClassifier.py.

Moreover, there is also another problem with line 32 of coral_dataset.py:

#PixelDropout(always_apply=False, p=0.2, dropout_prob=0.02, per_channel=0, drop_value=(0, 0, 0), #mask_drop_value=None)

according to documentation drop_value should be a float or a sequence of float (https://albumentations.ai/docs/api_reference/augmentations/transforms/. I fixed it using None but again I am not sure this is the right way to proceed.

How can I solve those issues?

Cheers,
Matteo

The text was updated successfully, but these errors were encountered:

maxcorsini · 2024-12-27T08:59:50Z

Hi, it seems that the number of classes set does not meet the number of classes during the training. This is not a problem of TagLab but a problem of the input data. Please, send me by email some information about the dictionary you are using, the classes selected to build the classifier, and what you have done to set up the dataset for the training.

Regarding the second problem, if you are the last version of albumentations you can replace drop_value = (0,0,0) with drop_value = [0,0,0].

Best

matteo-collina · 2024-12-27T09:37:53Z

Thanks for your help. I just sent you an email.

Best

matteo-collina changed the title ~~Mismatch Decoder when Train a Network~~ Mismatch Decoder when Training a Network Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch Decoder when Training a Network #173

Mismatch Decoder when Training a Network #173

matteo-collina commented Dec 26, 2024

maxcorsini commented Dec 27, 2024

matteo-collina commented Dec 27, 2024

Mismatch Decoder when Training a Network #173

Mismatch Decoder when Training a Network #173

Comments

matteo-collina commented Dec 26, 2024

maxcorsini commented Dec 27, 2024

matteo-collina commented Dec 27, 2024