Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard against unset resolved_archive_file #35628

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dmlap
Copy link
Contributor

@dmlap dmlap commented Jan 11, 2025

What does this PR do?

resolved_archive_file in _load_pretrained_model() appears to be optional. In my case, I was loading a model from a GGUF file and it was None:

model = AutoModelForCausalLM.from_pretrained(train_config.model_name,
                                             device_map='auto',
                                             gguf_file='llama3.2.gguf',
                                             offload_folder='offload')

In that case, archive_file ends up being None and the check for safe tensors raises an error. The change guards against that case and allows loading to continue.

I thought this change was minor enough that new tests were not warranted. If you feel otherwise, happy to add one if you can point me at the right place to do it.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! This is quite an edge case as we have resolved_archive_file = None when loading with gguf + we have disk in device_map. If you have time, please add a test in the tests/quantization/ggml/test_ggml.py file. cc @Isotr0py for visibility

Copy link
Contributor

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too! Just need a test case to cover this edge case in test_ggml.py.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@dmlap
Copy link
Contributor Author

dmlap commented Jan 14, 2025

Great! I should be able to add one sometime this week

@dmlap
Copy link
Contributor Author

dmlap commented Jan 22, 2025

Putting together a test case was helpful. It appears this condition is only triggered when a GGUF is loaded that is configured to offload a portion of model.state_dict() to disk. modeling_utils::_load_state_dict_into_meta_model() doesn't move the loaded state into modules mapped to disk. With the guard on archive_file from this patch, execution continues until from_pretrained() gets around to calling dispatch_model(), which promptly blows up trying to offload a meta tensor to disk.

I've worked around this by forcing the entire state_dict to get loaded in when gguf_path is specified – that gets the test working and produces the expected output. I'm not sure if this defeats the point of offloading some of the modules to disk (or if I'm missing something more fundamental).

Thoughts or suggestions? I can update the PR with what I have if seeing the necessary changes is easier.

@SunMarc
Copy link
Member

SunMarc commented Jan 22, 2025

After reflection, I think that we shouldn't allow offload with GGUF. This is because with gguf state_dict, we still have to modify the state dict to be compatible with transformers. So we can't really offload to disk.

@dmlap
Copy link
Contributor Author

dmlap commented Jan 22, 2025

That makes sense. Do you think that should happen in transformers or accelerate? If it’s in the model loading here, I don’t mind taking a crack at it.

@SunMarc
Copy link
Member

SunMarc commented Jan 23, 2025

The check should be in transformers !

When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.
@dmlap dmlap force-pushed the no-archive-no-safetensors branch 2 times, most recently from 10c685b to e9326f3 Compare February 1, 2025 03:46
GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.
@dmlap dmlap force-pushed the no-archive-no-safetensors branch from e9326f3 to 3c2066d Compare February 1, 2025 03:51
@dmlap
Copy link
Contributor Author

dmlap commented Feb 1, 2025

@SunMarc I modified the handling of device_map so when ”auto” is specified for a GGUF file, it will attempt to remap disk offload back to the CPU. If disk is explicitly configured, a NotImplementedError will be raised.

There’s a test case for the explicit disk mapping but there didn’t seem to be a non-invasive way of testing the auto remapping, and it didn’t seem worth it to me. Let me know if you disagree or would like any further modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants