Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The precision of ResNet18 is .0? #1

Open
talenz opened this issue Jun 8, 2021 · 12 comments
Open

The precision of ResNet18 is .0? #1

talenz opened this issue Jun 8, 2021 · 12 comments

Comments

@talenz
Copy link

talenz commented Jun 8, 2021

The command I ran is "python -m src.train_resnet --config ../config/train_resnet18.yaml", I got the accuracy is 0.0 after finetune! Any idea of what's causing it?

Training Epoch #9: loss: 7.25, accuracy: 0.02% (304/1281167)
Validation Epoch #9: 100%|████████████████████| 391/391 [00:36<00:00, 10.68it/s, loss=7.09, accuracy=0.08% (41/50000)]
Validation Epoch #9: loss (7.09), accuracy (0.08)
Done training!

@una-dinosauria
Copy link
Contributor

Huh. Can you please share the entire log? Is this with a single GPU?

@talenz
Copy link
Author

talenz commented Jun 8, 2021

Hey, I was using a single GPU! where can I find the entire log?

@talenz
Copy link
Author

talenz commented Jun 8, 2021

INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv1', 'layer1.1.conv1', 'layer2.0.conv1', 'layer2.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:01] Optimizing permutation for layer2.0.downsample.0
INFO:[2021/06/07 17:24:01] Greedy: 5.529302e-10 -> 2.744337e-10. Done in 0.01 seconds
layer2.0.downsample.0 2.301621e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:02<00:00, 4359.13it/s]INFO:[2021/06/07 17:24:04] SLS : 2.744337e-10 -> 2.301621e-10. Done in 2.30 seconds
INFO:[2021/06/07 17:24:04] layer2.0.downsample.0: prev covdet 5.529302e-10, new covdet: 2.301621e-10
INFO:[2021/06/07 17:24:04] Optimizing permutation for odict_keys(['layer2.1.conv1', 'layer3.0.conv1', 'layer3.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:04] Optimizing permutation for layer3.0.downsample.0
INFO:[2021/06/07 17:24:04] Greedy: 1.373051e-12 -> 1.138951e-12. Done in 0.02 seconds
layer3.0.downsample.0 1.078800e-12: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:05<00:00, 1763.26it/s]INFO:[2021/06/07 17:24:09] SLS : 1.138951e-12 -> 1.078800e-12. Done in 5.67 seconds
INFO:[2021/06/07 17:24:09] layer3.0.downsample.0: prev covdet 1.373051e-12, new covdet: 1.078800e-12
INFO:[2021/06/07 17:24:09] Optimizing permutation for odict_keys(['layer3.1.conv1', 'layer4.0.conv1', 'layer4.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:09] Optimizing permutation for layer4.0.downsample.0
INFO:[2021/06/07 17:24:10] Greedy: 1.327052e-12 -> 7.464063e-13. Done in 0.04 seconds
layer4.0.downsample.0 7.169530e-13: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:16<00:00, 598.93it/s]INFO:[2021/06/07 17:24:26] SLS : 7.464063e-13 -> 7.169530e-13. Done in 16.70 seconds
INFO:[2021/06/07 17:24:26] layer4.0.downsample.0: prev covdet 1.327052e-12, new covdet: 7.169530e-13
INFO:[2021/06/07 17:24:26] Optimizing permutation for odict_keys(['layer4.1.conv1', 'fc']) with 6 parents
INFO:[2021/06/07 17:24:26] Optimizing permutation for fc
INFO:[2021/06/07 17:24:26] Greedy: 5.416843e-10 -> 5.294320e-10. Done in 0.08 seconds
fc 5.158992e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [01:14<00:00, 134.48it/s]INFO:[2021/06/07 17:25:41] SLS : 5.294320e-10 -> 5.158992e-10. Done in 74.37 seconds
INFO:[2021/06/07 17:25:41] fc: prev covdet 5.416843e-10, new covdet: 5.158992e-10

@una-dinosauria
Copy link
Contributor

Yep that looks like the log, but definitely not all of it.

@talenz
Copy link
Author

talenz commented Jun 8, 2021

INFO:[2021/06/07 17:23:58] {
"dataloader": {
"batch_size": 128,
"imagenet_path": "imagenet",
"num_workers": 20,
"train_shuffle": true,
"validation_shuffle": false
},
"epochs": 9,
"learning_rate": 0.001,
"lr_scheduler": {
"min_lr": 1e-06,
"type": "cosine"
},
"model": {
"arch": "resnet18",
"compression_parameters": {
"fc_subvector_size": 4,
"ignored_modules": [
"conv1"
],
"k": 256,
"k_means_n_iters": 10,
"k_means_type": "src",
"large_subvectors": false,
"layer_specs": {
"fc": {
"k": 2048,
"k_means_type": "src"
}
},
"pw_subvector_size": 4
},
"permutations": [
[
{
"parents": [
"layer1.0.conv1",
"layer1.0.bn1"
]
},
{
"children": [
"layer1.0.conv2"
]
}
],
[
{
"parents": [
"layer1.1.conv1",
"layer1.1.bn1"
]
},
{
"children": [
"layer1.1.conv2"
]
}
],
[
{
"parents": [
"layer2.0.conv1",
"layer2.0.bn1"
]
},
{
"children": [
"layer2.0.conv2"
]
}
],
[
{
"parents": [
"layer2.1.conv1",
"layer2.1.bn1"
]
},
{
"children": [
"layer2.1.conv2"
]
}
],
[
{
"parents": [
"layer3.0.conv1",
"layer3.0.bn1"
]
},
{
"children": [
"layer3.0.conv2"
]
}
],
[
{
"parents": [
"layer3.1.conv1",
"layer3.1.bn1"
]
},
{
"children": [
"layer3.1.conv2"
]
}
],
[
{
"parents": [
"layer4.0.conv1",
"layer4.0.bn1"
]
},
{
"children": [
"layer4.0.conv2"
]
}
],
[
{
"parents": [
"layer4.1.conv1",
"layer4.1.bn1"
]
},
{
"children": [
"layer4.1.conv2"
]
}
],
[
{
"parents": [
"conv1",
"bn1",
"layer1.0.conv2",
"layer1.0.bn2",
"layer1.1.conv2",
"layer1.1.bn2"
]
},
{
"children": [
"layer1.0.conv1",
"layer1.1.conv1",
"layer2.0.conv1",
"layer2.0.downsample.0"
]
}
],
[
{
"parents": [
"layer2.0.downsample.0",
"layer2.0.downsample.1",
"layer2.0.conv2",
"layer2.0.bn2",
"layer2.1.conv2",
"layer2.1.bn2"
]
},
{
"children": [
"layer2.1.conv1",
"layer3.0.conv1",
"layer3.0.downsample.0"
]
}
],
[
{
"parents": [
"layer3.0.downsample.0",
"layer3.0.downsample.1",
"layer3.0.conv2",
"layer3.0.bn2",
"layer3.1.conv2",
"layer3.1.bn2"
]
},
{
"children": [
"layer3.1.conv1",
"layer4.0.conv1",
"layer4.0.downsample.0"
]
}
],
[
{
"parents": [
"layer4.0.downsample.0",
"layer4.0.downsample.1",
"layer4.0.conv2",
"layer4.0.bn2",
"layer4.1.conv2",
"layer4.1.bn2"
]
},
{
"children": [
"layer4.1.conv1",
"fc"
]
}
]
],
"sls_iterations": 10000,
"use_permutations": true
},
"momentum": 0.9,
"optimizer": "adam",
"output_path": "<your_output_path_here>",
"skip_initial_validation": false,
"weight_decay": 0.0001
}
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.0.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.1.conv2']) with 2 parents
INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping.
INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv1', 'layer1.1.conv1', 'layer2.0.conv1', 'layer2.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:01] Optimizing permutation for layer2.0.downsample.0
INFO:[2021/06/07 17:24:01] Greedy: 5.529302e-10 -> 2.744337e-10. Done in 0.01 seconds
layer2.0.downsample.0 2.301621e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:02<00:00, 4359.13it/s]INFO:[2021/06/07 17:24:04] SLS : 2.744337e-10 -> 2.301621e-10. Done in 2.30 seconds
INFO:[2021/06/07 17:24:04] layer2.0.downsample.0: prev covdet 5.529302e-10, new covdet: 2.301621e-10
INFO:[2021/06/07 17:24:04] Optimizing permutation for odict_keys(['layer2.1.conv1', 'layer3.0.conv1', 'layer3.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:04] Optimizing permutation for layer3.0.downsample.0
INFO:[2021/06/07 17:24:04] Greedy: 1.373051e-12 -> 1.138951e-12. Done in 0.02 seconds
layer3.0.downsample.0 1.078800e-12: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:05<00:00, 1763.26it/s]INFO:[2021/06/07 17:24:09] SLS : 1.138951e-12 -> 1.078800e-12. Done in 5.67 seconds
INFO:[2021/06/07 17:24:09] layer3.0.downsample.0: prev covdet 1.373051e-12, new covdet: 1.078800e-12
INFO:[2021/06/07 17:24:09] Optimizing permutation for odict_keys(['layer3.1.conv1', 'layer4.0.conv1', 'layer4.0.downsample.0']) with 6 parents
INFO:[2021/06/07 17:24:09] Optimizing permutation for layer4.0.downsample.0
INFO:[2021/06/07 17:24:10] Greedy: 1.327052e-12 -> 7.464063e-13. Done in 0.04 seconds
layer4.0.downsample.0 7.169530e-13: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:16<00:00, 598.93it/s]INFO:[2021/06/07 17:24:26] SLS : 7.464063e-13 -> 7.169530e-13. Done in 16.70 seconds
INFO:[2021/06/07 17:24:26] layer4.0.downsample.0: prev covdet 1.327052e-12, new covdet: 7.169530e-13
INFO:[2021/06/07 17:24:26] Optimizing permutation for odict_keys(['layer4.1.conv1', 'fc']) with 6 parents
INFO:[2021/06/07 17:24:26] Optimizing permutation for fc
INFO:[2021/06/07 17:24:26] Greedy: 5.416843e-10 -> 5.294320e-10. Done in 0.08 seconds
fc 5.158992e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [01:14<00:00, 134.48it/s]INFO:[2021/06/07 17:25:41] SLS : 5.294320e-10 -> 5.158992e-10. Done in 74.37 seconds
INFO:[2021/06/07 17:25:41] fc: prev covdet 5.416843e-10, new covdet: 5.158992e-10
INFO:[2021/06/07 17:25:41] layer1.0.conv1 compression: 10; mse: 4.434068e-03; codebook size: 256 x 9; code size: 64 x 64
INFO:[2021/06/07 17:25:41] layer1.0.conv2 compression: 10; mse: 3.580092e-03; codebook size: 256 x 9; code size: 64 x 64
INFO:[2021/06/07 17:25:41] layer1.1.conv1 compression: 10; mse: 4.652465e-03; codebook size: 256 x 9; code size: 64 x 64
INFO:[2021/06/07 17:25:41] layer1.1.conv2 compression: 10; mse: 3.743610e-03; codebook size: 256 x 9; code size: 64 x 64
INFO:[2021/06/07 17:25:41] layer2.0.conv1 compression: 10; mse: 3.415691e-03; codebook size: 256 x 9; code size: 128 x 64
INFO:[2021/06/07 17:25:41] layer2.0.conv2 compression: 10; mse: 2.482994e-03; codebook size: 256 x 9; code size: 128 x 128
INFO:[2021/06/07 17:25:41] layer2.0.downsample.0 compression: 10; mse: 1.262195e-03; codebook size: 256 x 4; code size: 128 x 16
INFO:[2021/06/07 17:25:41] layer2.1.conv1 compression: 10; mse: 2.888829e-03; codebook size: 256 x 9; code size: 128 x 128
INFO:[2021/06/07 17:25:41] layer2.1.conv2 compression: 10; mse: 2.073620e-03; codebook size: 256 x 9; code size: 128 x 128
INFO:[2021/06/07 17:25:41] layer3.0.conv1 compression: 10; mse: 1.135532e-03; codebook size: 256 x 9; code size: 256 x 128
INFO:[2021/06/07 17:25:41] layer3.0.conv2 compression: 10; mse: 1.265556e-03; codebook size: 256 x 9; code size: 256 x 256
INFO:[2021/06/07 17:25:41] layer3.0.downsample.0 compression: 10; mse: 3.959292e-04; codebook size: 256 x 4; code size: 256 x 32
INFO:[2021/06/07 17:25:41] layer3.1.conv1 compression: 10; mse: 1.148529e-03; codebook size: 256 x 9; code size: 256 x 256
INFO:[2021/06/07 17:25:41] layer3.1.conv2 compression: 10; mse: 8.906376e-04; codebook size: 256 x 9; code size: 256 x 256
INFO:[2021/06/07 17:25:41] layer4.0.conv1 compression: 10; mse: 6.173717e-04; codebook size: 256 x 9; code size: 512 x 256
INFO:[2021/06/07 17:25:41] layer4.0.conv2 compression: 10; mse: 6.496188e-04; codebook size: 256 x 9; code size: 512 x 512
INFO:[2021/06/07 17:25:41] layer4.0.downsample.0 compression: 10; mse: 3.887021e-04; codebook size: 256 x 4; code size: 512 x 64
INFO:[2021/06/07 17:25:41] layer4.1.conv1 compression: 10; mse: 4.462329e-04; codebook size: 256 x 9; code size: 512 x 512
INFO:[2021/06/07 17:25:41] layer4.1.conv2 compression: 10; mse: 6.987687e-05; codebook size: 256 x 9; code size: 512 x 512
INFO:[2021/06/07 17:25:44] fc compression: 10; mse: 6.214550e-04; codebook size: 2048 x 4; code size: 1000 x 128
INFO:[2021/06/07 17:25:44]
uncompressed (bits): 374064384
compressed (bits): 12927232
uncompressed (MB): 44.59
compressed (MB): 1.54
compression ratio: 28.94
Validation Epoch #0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:36<00:00, 10.75it/s, loss=6.29, accuracy=5.57% (2784/50000)]Validation Epoch #0: loss (6.29), accuracy (5.57)
Training Epoch #1: 78%|██████████████████████████████████████████████████████████████████████████████████████████████████ | 7790/10010 [16:05<04:49, 7.68it/s, loss=7.2, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #1: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:40<00:00, 8.07it/s, loss=9.66, accuracy=0]Training Epoch #1: loss: 7.78, accuracy: 0.07% (937/1281167)
Validation Epoch #1: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:43<00:00, 8.91it/s, loss=8.83, accuracy=0.08% (39/50000)]Validation Epoch #1: loss (8.83), accuracy (0.08)
Training Epoch #2: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7794/10010 [16:26<04:52, 7.57it/s, loss=6.09, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #2: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [21:06<00:00, 7.90it/s, loss=8.14, accuracy=0]Training Epoch #2: loss: 6.99, accuracy: 0.00% (2/1281167)
Validation Epoch #2: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:39<00:00, 9.83it/s, loss=8.17, accuracy=0.12% (58/50000)]Validation Epoch #2: loss (8.17), accuracy (0.12)
Training Epoch #3: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:02<04:25, 8.36it/s, loss=7.01, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #3: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:35<00:00, 8.10it/s, loss=8.94, accuracy=0]Training Epoch #3: loss: 6.59, accuracy: 0.01% (138/1281167)
Validation Epoch #3: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:35<00:00, 11.10it/s, loss=9.10, accuracy=0.08% (40/50000)]Validation Epoch #3: loss (9.10), accuracy (0.08)
Training Epoch #4: 78%|██████████████████████████████████████████████████████████████████████████████████████████████████ | 7790/10010 [16:09<04:28, 8.27it/s, loss=6.7, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #4: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:44<00:00, 8.04it/s, loss=6.46, accuracy=0]Training Epoch #4: loss: 6.44, accuracy: 0.02% (262/1281167)
Validation Epoch #4: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:36<00:00, 10.77it/s, loss=9.12, accuracy=0.10% (49/50000)]Validation Epoch #4: loss (9.12), accuracy (0.10)
Training Epoch #5: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:07<04:29, 8.24it/s, loss=6.06, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #5: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:43<00:00, 8.05it/s, loss=5.62, accuracy=0]Training Epoch #5: loss: 6.43, accuracy: 0.02% (239/1281167)
Validation Epoch #5: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:47<00:00, 8.29it/s, loss=9.33, accuracy=0.10% (52/50000)]Validation Epoch #5: loss (9.33), accuracy (0.10)
Training Epoch #6: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:04<04:20, 8.53it/s, loss=6.73, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #6: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:36<00:00, 8.10it/s, loss=12.6, accuracy=0]Training Epoch #6: loss: 6.22, accuracy: 0.09% (1117/1281167)
Validation Epoch #6: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:35<00:00, 11.08it/s, loss=9.18, accuracy=0.09% (46/50000)]Validation Epoch #6: loss (9.18), accuracy (0.09)
Training Epoch #7: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:02<04:41, 7.88it/s, loss=7.38, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #7: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:36<00:00, 8.10it/s, loss=6.59, accuracy=0]Training Epoch #7: loss: 6.33, accuracy: 0.06% (776/1281167)
Validation Epoch #7: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:42<00:00, 9.20it/s, loss=8.28, accuracy=0.10% (50/50000)]Validation Epoch #7: loss (8.28), accuracy (0.10)
Training Epoch #8: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:07<04:30, 8.21it/s, loss=7.35, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #8: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:43<00:00, 8.05it/s, loss=5.31, accuracy=0]Training Epoch #8: loss: 6.73, accuracy: 0.06% (753/1281167)
Validation Epoch #8: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:34<00:00, 11.38it/s, loss=8.29, accuracy=0.10% (52/50000)]Validation Epoch #8: loss (8.29), accuracy (0.10)
Training Epoch #9: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:12<04:31, 8.17it/s, loss=7.36, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Training Epoch #9: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:46<00:00, 8.03it/s, loss=6.63, accuracy=0]Training Epoch #9: loss: 7.25, accuracy: 0.02% (304/1281167)
Validation Epoch #9: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:36<00:00, 10.68it/s, loss=7.09, accuracy=0.08% (41/50000)]Validation Epoch #9: loss (7.09), accuracy (0.08)
Done training!

This should be the whole log info

@una-dinosauria
Copy link
Contributor

It seems like the accuracy after the initial compression is too low (5%). I'll try to reproduce on my end, thanks for bringing this up!

@talenz
Copy link
Author

talenz commented Jun 8, 2021

It seems like the accuracy after the initial compression is too low (5%). I'll try to reproduce on my end, thanks for bringing this up!

Thanks! Waiting for your info~

@lewin4
Copy link

lewin4 commented Aug 3, 2021

I also have similar situation, the accuracy was maintained at 5% and has not changed!

@una-dinosauria
Copy link
Contributor

Are you seeing this with other models too? Or just with Resnet 18?
Could any of you provide a docker image to reproduce your error? (I should have provided one to reproduce our experiments, sorry about that)

@una-dinosauria
Copy link
Contributor

Hello @talenz and @lewin4,

I have managed to reproduce the issue that you are reporting.

I apologize. Since we developed this code on machines with distributed training with horovod, we missed a bug in the dataloader. As written, the training imagenet dataloader is not shuffling the training data.

You can replace

loader = DataLoader(dataset, batch_size=batch_size, num_workers=num_workers, sampler=sampler, pin_memory=True)

with

loader = DataLoader(
        dataset,
        batch_size=batch_size,
        num_workers=num_workers,
        shuffle=(sampler is None),
        sampler=sampler,
        pin_memory=True
    )

and that should bring back training numbers that make sense.

Also, please note that I don't have write access to this repo anymore, so I will push patches to my personal fork at https://github.com/una-dinosauria/permute-quantize-finetune/. I'll let you know here when that repo is patched.

Again, sorry for this mistake, and thank you so much for reporting this issue.

Cheers,

@una-dinosauria
Copy link
Contributor

Fixed by una-dinosauria#2.

Cheers,

@una-dinosauria
Copy link
Contributor

I've also added a docker image to make it easier to reproduce our results: una-dinosauria#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants