Does the input tensor need to be resized as (224, 224)? #237
-
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hey @kitaharasetusna 👋 This project was meant to bring as much flexibility as possible, so it does not introduce a bias for image size. However the models you're using may do so. If you use a resnet18 trained on (224,224) inputs, passing (32,32) will not likely produce good things. To give you a better understanding of the inner works of a resnet18, using TorchScan: from torchscan import summary
from torchvision import resnet18
model = resnet18().eval()
summary(model, (3, 224, 224), max_depth=2) yields
Depending on the layer you want to extract the CAM from, usually the last conv layer (in layer4), you need an unflattened tensor. As you can see, before the pooling layer, the smallest feature map has at least (7,7) spatial size. This can be upscaled for visualization easily summary(model, (3, 32, 32), max_depth=2) yields
Here in layer4, because the input is passing through size reducing layers, it ends up being flattened (1, 1). This can't be upscaled since it will only produced a uniform distribution. However layer3 still has (2, 2) So, in short:
grad_cam(model=model, image_tensor=...) isn't using this library so I can't help over there 😅 Let me know if you have any additional questions, take care ✌️ |
Beta Was this translation helpful? Give feedback.
Hey @kitaharasetusna 👋
This project was meant to bring as much flexibility as possible, so it does not introduce a bias for image size. However the models you're using may do so. If you use a resnet18 trained on (224,224) inputs, passing (32,32) will not likely produce good things.
To give you a better understanding of the inner works of a resnet18, using TorchScan:
yields