The precision of ResNet18 is .0? #1

talenz · 2021-06-08T01:40:17Z

The command I ran is "python -m src.train_resnet --config ../config/train_resnet18.yaml", I got the accuracy is 0.0 after finetune! Any idea of what's causing it?

Training Epoch #9: loss: 7.25, accuracy: 0.02% (304/1281167)
Validation Epoch #9: 100%|████████████████████| 391/391 [00:36<00:00, 10.68it/s, loss=7.09, accuracy=0.08% (41/50000)]
Validation Epoch #9: loss (7.09), accuracy (0.08)
Done training!

una-dinosauria · 2021-06-08T01:52:56Z

Huh. Can you please share the entire log? Is this with a single GPU?

talenz · 2021-06-08T01:55:32Z

Hey, I was using a single GPU! where can I find the entire log?

talenz · 2021-06-08T01:58:28Z

INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv1', 'layer1.1.conv1', 'layer2.0.conv1', 'layer2.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:01] Optimizing permutation for layer2.0.downsample.0
INFO:[2021/06/07 17:24:01] Greedy: 5.529302e-10 -> 2.744337e-10. Done in 0.01 seconds
layer2.0.downsample.0 2.301621e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:02<00:00, 4359.13it/s]INFO:[2021/06/07 17:24:04] SLS : 2.744337e-10 -> 2.301621e-10. Done in 2.30 seconds
INFO:[2021/06/07 17:24:04] layer2.0.downsample.0: prev covdet 5.529302e-10, new covdet: 2.301621e-10
INFO:[2021/06/07 17:24:04] Optimizing permutation for odict_keys(['layer2.1.conv1', 'layer3.0.conv1', 'layer3.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:04] Optimizing permutation for layer3.0.downsample.0
INFO:[2021/06/07 17:24:04] Greedy: 1.373051e-12 -> 1.138951e-12. Done in 0.02 seconds
layer3.0.downsample.0 1.078800e-12: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:05<00:00, 1763.26it/s]INFO:[2021/06/07 17:24:09] SLS : 1.138951e-12 -> 1.078800e-12. Done in 5.67 seconds
INFO:[2021/06/07 17:24:09] layer3.0.downsample.0: prev covdet 1.373051e-12, new covdet: 1.078800e-12
INFO:[2021/06/07 17:24:09] Optimizing permutation for odict_keys(['layer3.1.conv1', 'layer4.0.conv1', 'layer4.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:09] Optimizing permutation for layer4.0.downsample.0
INFO:[2021/06/07 17:24:10] Greedy: 1.327052e-12 -> 7.464063e-13. Done in 0.04 seconds
layer4.0.downsample.0 7.169530e-13: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:16<00:00, 598.93it/s]INFO:[2021/06/07 17:24:26] SLS : 7.464063e-13 -> 7.169530e-13. Done in 16.70 seconds
INFO:[2021/06/07 17:24:26] layer4.0.downsample.0: prev covdet 1.327052e-12, new covdet: 7.169530e-13
INFO:[2021/06/07 17:24:26] Optimizing permutation for odict_keys(['layer4.1.conv1', 'fc']) with 6 parents
INFO:[2021/06/07 17:24:26] Optimizing permutation for fc
INFO:[2021/06/07 17:24:26] Greedy: 5.416843e-10 -> 5.294320e-10. Done in 0.08 seconds
fc 5.158992e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [01:14<00:00, 134.48it/s]INFO:[2021/06/07 17:25:41] SLS : 5.294320e-10 -> 5.158992e-10. Done in 74.37 seconds
INFO:[2021/06/07 17:25:41] fc: prev covdet 5.416843e-10, new covdet: 5.158992e-10

una-dinosauria · 2021-06-08T02:13:29Z

Yep that looks like the log, but definitely not all of it.

talenz · 2021-06-08T02:28:15Z

INFO:[2021/06/07 17:23:58] {
"dataloader": {
"batch_size": 128,
"imagenet_path": "imagenet",
"num_workers": 20,
"train_shuffle": true,
"validation_shuffle": false
},
"epochs": 9,
"learning_rate": 0.001,
"lr_scheduler": {
"min_lr": 1e-06,
"type": "cosine"
},
"model": {
"arch": "resnet18",
"compression_parameters": {
"fc_subvector_size": 4,
"ignored_modules": [
"conv1"
],
"k": 256,
"k_means_n_iters": 10,
"k_means_type": "src",
"large_subvectors": false,
"layer_specs": {
"fc": {
"k": 2048,
"k_means_type": "src"
}
},
"pw_subvector_size": 4
},
"permutations": [
[
{
"parents": [
"layer1.0.conv1",
"layer1.0.bn1"
]
},
{
"children": [
"layer1.0.conv2"
]
}
],
[
{
"parents": [
"layer1.1.conv1",
"layer1.1.bn1"
]
},
{
"children": [
"layer1.1.conv2"
]
}
],
[
{
"parents": [
"layer2.0.conv1",
"layer2.0.bn1"
]
},
{
"children": [
"layer2.0.conv2"
]
}
],
[
{
"parents": [
"layer2.1.conv1",
"layer2.1.bn1"
]
},
{
"children": [
"layer2.1.conv2"
]
}
],
[
{
"parents": [
"layer3.0.conv1",
"layer3.0.bn1"
]
},
{
"children": [
"layer3.0.conv2"
]
}
],
[
{
"parents": [
"layer3.1.conv1",
"layer3.1.bn1"
]
},
{
"children": [
"layer3.1.conv2"
]
}
],
[
{
"parents": [
"layer4.0.conv1",
"layer4.0.bn1"
]
},
{
"children": [
"layer4.0.conv2"
]
}
],
[
{
"parents": [
"layer4.1.conv1",
"layer4.1.bn1"
]
},
{
"children": [
"layer4.1.conv2"
]
}
],
[
{
"parents": [
"conv1",
"bn1",
"layer1.0.conv2",
"layer1.0.bn2",
"layer1.1.conv2",
"layer1.1.bn2"
]
},
{
"children": [
"layer1.0.conv1",
"layer1.1.conv1",
"layer2.0.conv1",
"layer2.0.downsample.0"
]
}
],
[
{
"parents": [
"layer2.0.downsample.0",
"layer2.0.downsample.1",
"layer2.0.conv2",
"layer2.0.bn2",
"layer2.1.conv2",
"layer2.1.bn2"
]
},
{
"children": [
"layer2.1.conv1",
"layer3.0.conv1",
"layer3.0.downsample.0"
]
}
],
[
{
"parents": [
"layer3.0.downsample.0",
"layer3.0.downsample.1",
"layer3.0.conv2",
"layer3.0.bn2",
"layer3.1.conv2",
"layer3.1.bn2"
]
},
{
"children": [
"layer3.1.conv1",
"layer4.0.conv1",
"layer4.0.downsample.0"
]
}
],
[
{
"parents": [
"layer4.0.downsample.0",
"layer4.0.downsample.1",
"layer4.0.conv2",
"layer4.0.bn2",
"layer4.1.conv2",
"layer4.1.bn2"
]
},
{
"children": [
"layer4.1.conv1",
"fc"
]
}
]
],
"sls_iterations": 10000,
"use_permutations": true
},
"momentum": 0.9,
"optimizer": "adam",
"output_path": "<your_output_path_here>",
"skip_initial_validation": false,
"weight_decay": 0.0001
}
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv1', 'layer1.1.conv1', 'layer2.0.conv1', 'layer2.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:01] Optimizing permutation for layer2.0.downsample.0
INFO:[2021/06/07 17:24:01] Greedy: 5.529302e-10 -> 2.744337e-10. Done in 0.01 seconds
layer2.0.downsample.0 2.301621e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:02<00:00, 4359.13it/s]INFO:[2021/06/07 17:24:04] SLS : 2.744337e-10 -> 2.301621e-10. Done in 2.30 seconds
INFO:[2021/06/07 17:24:04] layer2.0.downsample.0: prev covdet 5.529302e-10, new covdet: 2.301621e-10
INFO:[2021/06/07 17:24:04] Optimizing permutation for odict_keys(['layer2.1.conv1', 'layer3.0.conv1', 'layer3.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:04] Optimizing permutation for layer3.0.downsample.0
INFO:[2021/06/07 17:24:04] Greedy: 1.373051e-12 -> 1.138951e-12. Done in 0.02 seconds
layer3.0.downsample.0 1.078800e-12: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:05<00:00, 1763.26it/s]INFO:[2021/06/07 17:24:09] SLS : 1.138951e-12 -> 1.078800e-12. Done in 5.67 seconds
INFO:[2021/06/07 17:24:09] layer3.0.downsample.0: prev covdet 1.373051e-12, new covdet: 1.078800e-12
INFO:[2021/06/07 17:24:09] Optimizing permutation for odict_keys(['layer3.1.conv1', 'layer4.0.conv1', 'layer4.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:09] Optimizing permutation for layer4.0.downsample.0
INFO:[2021/06/07 17:24:10] Greedy: 1.327052e-12 -> 7.464063e-13. Done in 0.04 seconds
layer4.0.downsample.0 7.169530e-13: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:16<00:00, 598.93it/s]INFO:[2021/06/07 17:24:26] SLS : 7.464063e-13 -> 7.169530e-13. Done in 16.70 seconds
INFO:[2021/06/07 17:24:26] layer4.0.downsample.0: prev covdet 1.327052e-12, new covdet: 7.169530e-13
INFO:[2021/06/07 17:24:26] Optimizing permutation for odict_keys(['layer4.1.conv1', 'fc']) with 6 parents
INFO:[2021/06/07 17:24:26] Optimizing permutation for fc
INFO:[2021/06/07 17:24:26] Greedy: 5.416843e-10 -> 5.294320e-10. Done in 0.08 seconds
fc 5.158992e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [01:14<00:00, 134.48it/s]INFO:[2021/06/07 17:25:41] SLS : 5.294320e-10 -> 5.158992e-10. Done in 74.37 seconds
INFO:[2021/06/07 17:25:41] fc: prev covdet 5.416843e-10, new covdet: 5.158992e-10
INFO:[2021/06/07 17:25:41] layer1.0.conv1 compression: 10; mse: 4.434068e-03; codebook size: 256 x 9; code size: 64 x 64
INFO:[2021/06/07 17:25:41] layer1.0.conv2 compression: 10; mse: 3.580092e-03; codebook size: 256 x 9; code size: 64 x 64
INFO:[2021/06/07 17:25:41] layer1.1.conv1 compression: 10; mse: 4.652465e-03; codebook size: 256 x 9; code size: 64 x 64
INFO:[2021/06/07 17:25:41] layer1.1.conv2 compression: 10; mse: 3.743610e-03; codebook size: 256 x 9; code size: 64 x 64
INFO:[2021/06/07 17:25:41] layer2.0.conv1 compression: 10; mse: 3.415691e-03; codebook size: 256 x 9; code size: 128 x 64
INFO:[2021/06/07 17:25:41] layer2.0.conv2 compression: 10; mse: 2.482994e-03; codebook size: 256 x 9; code size: 128 x 128
INFO:[2021/06/07 17:25:41] layer2.0.downsample.0 compression: 10; mse: 1.262195e-03; codebook size: 256 x 4; code size: 128 x 16
INFO:[2021/06/07 17:25:41] layer2.1.conv1 compression: 10; mse: 2.888829e-03; codebook size: 256 x 9; code size: 128 x 128
INFO:[2021/06/07 17:25:41] layer2.1.conv2 compression: 10; mse: 2.073620e-03; codebook size: 256 x 9; code size: 128 x 128
INFO:[2021/06/07 17:25:41] layer3.0.conv1 compression: 10; mse: 1.135532e-03; codebook size: 256 x 9; code size: 256 x 128
INFO:[2021/06/07 17:25:41] layer3.0.conv2 compression: 10; mse: 1.265556e-03; codebook size: 256 x 9; code size: 256 x 256
INFO:[2021/06/07 17:25:41] layer3.0.downsample.0 compression: 10; mse: 3.959292e-04; codebook size: 256 x 4; code size: 256 x 32
INFO:[2021/06/07 17:25:41] layer3.1.conv1 compression: 10; mse: 1.148529e-03; codebook size: 256 x 9; code size: 256 x 256
INFO:[2021/06/07 17:25:41] layer3.1.conv2 compression: 10; mse: 8.906376e-04; codebook size: 256 x 9; code size: 256 x 256
INFO:[2021/06/07 17:25:41] layer4.0.conv1 compression: 10; mse: 6.173717e-04; codebook size: 256 x 9; code size: 512 x 256
INFO:[2021/06/07 17:25:41] layer4.0.conv2 compression: 10; mse: 6.496188e-04; codebook size: 256 x 9; code size: 512 x 512
INFO:[2021/06/07 17:25:41] layer4.0.downsample.0 compression: 10; mse: 3.887021e-04; codebook size: 256 x 4; code size: 512 x 64
INFO:[2021/06/07 17:25:41] layer4.1.conv1 compression: 10; mse: 4.462329e-04; codebook size: 256 x 9; code size: 512 x 512
INFO:[2021/06/07 17:25:41] layer4.1.conv2 compression: 10; mse: 6.987687e-05; codebook size: 256 x 9; code size: 512 x 512
INFO:[2021/06/07 17:25:44] fc compression: 10; mse: 6.214550e-04; codebook size: 2048 x 4; code size: 1000 x 128
INFO:[2021/06/07 17:25:44]
uncompressed (bits): 374064384
compressed (bits): 12927232
uncompressed (MB): 44.59
compressed (MB): 1.54
compression ratio: 28.94
Validation Epoch #0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:36<00:00, 10.75it/s, loss=6.29, accuracy=5.57% (2784/50000)]Validation Epoch #0: loss (6.29), accuracy (5.57)
Training Epoch #1: 78%|██████████████████████████████████████████████████████████████████████████████████████████████████ | 7790/10010 [16:05<04:49, 7.68it/s, loss=7.2, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #1: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:40<00:00, 8.07it/s, loss=9.66, accuracy=0]Training Epoch #1: loss: 7.78, accuracy: 0.07% (937/1281167)
Validation Epoch #1: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:43<00:00, 8.91it/s, loss=8.83, accuracy=0.08% (39/50000)]Validation Epoch #1: loss (8.83), accuracy (0.08)
Training Epoch #2: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7794/10010 [16:26<04:52, 7.57it/s, loss=6.09, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #2: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [21:06<00:00, 7.90it/s, loss=8.14, accuracy=0]Training Epoch #2: loss: 6.99, accuracy: 0.00% (2/1281167)
Validation Epoch #2: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:39<00:00, 9.83it/s, loss=8.17, accuracy=0.12% (58/50000)]Validation Epoch #2: loss (8.17), accuracy (0.12)
Training Epoch #3: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:02<04:25, 8.36it/s, loss=7.01, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #3: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:35<00:00, 8.10it/s, loss=8.94, accuracy=0]Training Epoch #3: loss: 6.59, accuracy: 0.01% (138/1281167)
Validation Epoch #3: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:35<00:00, 11.10it/s, loss=9.10, accuracy=0.08% (40/50000)]Validation Epoch #3: loss (9.10), accuracy (0.08)
Training Epoch #4: 78%|██████████████████████████████████████████████████████████████████████████████████████████████████ | 7790/10010 [16:09<04:28, 8.27it/s, loss=6.7, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #4: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:44<00:00, 8.04it/s, loss=6.46, accuracy=0]Training Epoch #4: loss: 6.44, accuracy: 0.02% (262/1281167)
Validation Epoch #4: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:36<00:00, 10.77it/s, loss=9.12, accuracy=0.10% (49/50000)]Validation Epoch #4: loss (9.12), accuracy (0.10)
Training Epoch #5: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:07<04:29, 8.24it/s, loss=6.06, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #5: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:43<00:00, 8.05it/s, loss=5.62, accuracy=0]Training Epoch #5: loss: 6.43, accuracy: 0.02% (239/1281167)
Validation Epoch #5: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:47<00:00, 8.29it/s, loss=9.33, accuracy=0.10% (52/50000)]Validation Epoch #5: loss (9.33), accuracy (0.10)
Training Epoch #6: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:04<04:20, 8.53it/s, loss=6.73, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #6: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:36<00:00, 8.10it/s, loss=12.6, accuracy=0]Training Epoch #6: loss: 6.22, accuracy: 0.09% (1117/1281167)
Validation Epoch #6: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:35<00:00, 11.08it/s, loss=9.18, accuracy=0.09% (46/50000)]Validation Epoch #6: loss (9.18), accuracy (0.09)
Training Epoch #7: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:02<04:41, 7.88it/s, loss=7.38, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #7: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:36<00:00, 8.10it/s, loss=6.59, accuracy=0]Training Epoch #7: loss: 6.33, accuracy: 0.06% (776/1281167)
Validation Epoch #7: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:42<00:00, 9.20it/s, loss=8.28, accuracy=0.10% (50/50000)]Validation Epoch #7: loss (8.28), accuracy (0.10)
Training Epoch #8: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:07<04:30, 8.21it/s, loss=7.35, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #8: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:43<00:00, 8.05it/s, loss=5.31, accuracy=0]Training Epoch #8: loss: 6.73, accuracy: 0.06% (753/1281167)
Validation Epoch #8: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:34<00:00, 11.38it/s, loss=8.29, accuracy=0.10% (52/50000)]Validation Epoch #8: loss (8.29), accuracy (0.10)
Training Epoch #9: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:12<04:31, 8.17it/s, loss=7.36, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #9: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:46<00:00, 8.03it/s, loss=6.63, accuracy=0]Training Epoch #9: loss: 7.25, accuracy: 0.02% (304/1281167)
Validation Epoch #9: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:36<00:00, 10.68it/s, loss=7.09, accuracy=0.08% (41/50000)]Validation Epoch #9: loss (7.09), accuracy (0.08)
Done training!

This should be the whole log info

una-dinosauria · 2021-06-08T03:26:11Z

It seems like the accuracy after the initial compression is too low (5%). I'll try to reproduce on my end, thanks for bringing this up!

talenz · 2021-06-08T03:37:28Z

It seems like the accuracy after the initial compression is too low (5%). I'll try to reproduce on my end, thanks for bringing this up!

Thanks! Waiting for your info~

lewin4 · 2021-08-03T03:45:40Z

I also have similar situation, the accuracy was maintained at 5% and has not changed！

una-dinosauria · 2021-08-03T23:46:42Z

Are you seeing this with other models too? Or just with Resnet 18?
Could any of you provide a docker image to reproduce your error? (I should have provided one to reproduce our experiments, sorry about that)

una-dinosauria · 2021-09-07T00:16:02Z

Hello @talenz and @lewin4,

I have managed to reproduce the issue that you are reporting.

I apologize. Since we developed this code on machines with distributed training with horovod, we missed a bug in the dataloader. As written, the training imagenet dataloader is not shuffling the training data.

You can replace

permute-quantize-finetune/src/dataloading/imagenet_loader.py

Line 70 in 53a30ba

    
           loader = DataLoader(dataset, batch_size=batch_size, num_workers=num_workers, sampler=sampler, pin_memory=True)

with

loader = DataLoader(
        dataset,
        batch_size=batch_size,
        num_workers=num_workers,
        shuffle=(sampler is None),
        sampler=sampler,
        pin_memory=True
    )

and that should bring back training numbers that make sense.

Also, please note that I don't have write access to this repo anymore, so I will push patches to my personal fork at https://github.com/una-dinosauria/permute-quantize-finetune/. I'll let you know here when that repo is patched.

Again, sorry for this mistake, and thank you so much for reporting this issue.

Cheers,

una-dinosauria · 2021-09-07T00:26:22Z

Fixed by una-dinosauria#2.

Cheers,

una-dinosauria · 2021-09-07T01:10:33Z

I've also added a docker image to make it easier to reproduce our results: una-dinosauria#3

una-dinosauria mentioned this issue Sep 7, 2021

fix shuffle error una-dinosauria/permute-quantize-finetune#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The precision of ResNet18 is .0? #1

The precision of ResNet18 is .0? #1

talenz commented Jun 8, 2021

una-dinosauria commented Jun 8, 2021

talenz commented Jun 8, 2021

talenz commented Jun 8, 2021

una-dinosauria commented Jun 8, 2021

talenz commented Jun 8, 2021

una-dinosauria commented Jun 8, 2021

talenz commented Jun 8, 2021

lewin4 commented Aug 3, 2021

una-dinosauria commented Aug 3, 2021

una-dinosauria commented Sep 7, 2021

una-dinosauria commented Sep 7, 2021

una-dinosauria commented Sep 7, 2021

The precision of ResNet18 is .0? #1

The precision of ResNet18 is .0? #1

Comments

talenz commented Jun 8, 2021

una-dinosauria commented Jun 8, 2021

talenz commented Jun 8, 2021

talenz commented Jun 8, 2021

una-dinosauria commented Jun 8, 2021

talenz commented Jun 8, 2021

una-dinosauria commented Jun 8, 2021

talenz commented Jun 8, 2021

lewin4 commented Aug 3, 2021

una-dinosauria commented Aug 3, 2021

una-dinosauria commented Sep 7, 2021

una-dinosauria commented Sep 7, 2021

una-dinosauria commented Sep 7, 2021