-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guard against unset resolved_archive_file #35628
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! This is quite an edge case as we have resolved_archive_file = None
when loading with gguf + we have disk
in device_map
. If you have time, please add a test in the tests/quantization/ggml/test_ggml.py file. cc @Isotr0py for visibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too! Just need a test case to cover this edge case in test_ggml.py
.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Great! I should be able to add one sometime this week |
Putting together a test case was helpful. It appears this condition is only triggered when a GGUF is loaded that is configured to offload a portion of I've worked around this by forcing the entire Thoughts or suggestions? I can update the PR with what I have if seeing the necessary changes is easier. |
After reflection, I think that we shouldn't allow offload with GGUF. This is because with gguf state_dict, we still have to modify the state dict to be compatible with transformers. So we can't really offload to disk. |
That makes sense. Do you think that should happen in transformers or accelerate? If it’s in the model loading here, I don’t mind taking a crack at it. |
The check should be in transformers ! |
When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.
10c685b
to
e9326f3
Compare
GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.
e9326f3
to
3c2066d
Compare
@SunMarc I modified the handling of There’s a test case for the explicit disk mapping but there didn’t seem to be a non-invasive way of testing the |
What does this PR do?
resolved_archive_file
in_load_pretrained_model()
appears to be optional. In my case, I was loading a model from a GGUF file and it wasNone
:In that case,
archive_file
ends up beingNone
and the check for safe tensors raises an error. The change guards against that case and allows loading to continue.I thought this change was minor enough that new tests were not warranted. If you feel otherwise, happy to add one if you can point me at the right place to do it.