You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
where <model> comes from all the models in the registry (python db.py --list-models-registry).
However, for many of the models, I see a pickling error due to the checkpoint not being loaded correctly. See stack-trace below:
Traceback (most recent call last):
File "/lib/python3.8/site-packages/
torch/multiprocessing/spawn.py", line 69, in _wrapfn(i, *args)
File "imagenet-testbed/src/inference.py", line 64, in main_worker
model = py_model.generate_classifier(py_eval_setting)
File "imagenet-testbed/src/models/model_base.py", line 76, in generate_classifier
self.classifier = self.classifier_loader()
File "imagenet-testbed/src/models/low_accuracy.py", line 100, in load_resnet
load_model_state_dict(net, model_name)
File "imagenet-testbed/src/mldb/utils.py",
line 98, in load_model_state_dict
state_dict = torch.load(bio, map_location=f'cpu')
File "/lib/python3.8/site-packages/
torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/lib/python3.8/site-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input
I see that this error happens for all of the low-resource models like resnet18_100k_x_epochs, resnet18_50k_x_epochs etc. To fully ensure this is not an artefact of my own custom data-split, I also tested this on the imagenet-val split with no success.
Are the low-resource models not available as checkpoints from the server?
Also, another set of errors I get when running this is due to some checkpoints still being stored on the vasa endpoint, see:
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://vasa.millennium.berkeley.edu:9000/robustness-eval/checkpoints/3NL5sQy84F9nefxVCVDzew_data.bytes"
Are some of the checkpoints not migrated fully yet?
Sorry for the long verbose issue, but hope we can get this resolved :)
The text was updated successfully, but these errors were encountered:
Hi @vishaal27,
Unfortunately some checkpoints are not online as they are on vasa and have not been migrated to the gcloud bucket yet. I'm not sure if/when they'll come online, as the path to migrating them is not so straightforward as I have lost my berkeley access now :)
Hey,
I was running model evaluations on my own custom data-split for all models in the registry using:
where
<model>
comes from all the models in the registry (python db.py --list-models-registry
).However, for many of the models, I see a pickling error due to the checkpoint not being loaded correctly. See stack-trace below:
I see that this error happens for all of the low-resource models like
resnet18_100k_x_epochs
,resnet18_50k_x_epochs
etc. To fully ensure this is not an artefact of my own custom data-split, I also tested this on the imagenet-val split with no success.Are the low-resource models not available as checkpoints from the server?
Also, another set of errors I get when running this is due to some checkpoints still being stored on the vasa endpoint, see:
Are some of the checkpoints not migrated fully yet?
Sorry for the long verbose issue, but hope we can get this resolved :)
The text was updated successfully, but these errors were encountered: