Guard against unset resolved_archive_file #35628

dmlap · 2025-01-11T17:00:19Z

What does this PR do?

resolved_archive_file in _load_pretrained_model() appears to be optional. In my case, I was loading a model from a GGUF file and it was None:

model = AutoModelForCausalLM.from_pretrained(train_config.model_name,
                                             device_map='auto',
                                             gguf_file='llama3.2.gguf',
                                             offload_folder='offload')

In that case, archive_file ends up being None and the check for safe tensors raises an error. The change guards against that case and allows loading to continue.

I thought this change was minor enough that new tests were not warranted. If you feel otherwise, happy to add one if you can point me at the right place to do it.

SunMarc

LGTM ! This is quite an edge case as we have resolved_archive_file = None when loading with gguf + we have disk in device_map. If you have time, please add a test in the tests/quantization/ggml/test_ggml.py file. cc @Isotr0py for visibility

Isotr0py

LGTM too! Just need a test case to cover this edge case in test_ggml.py.

HuggingFaceDocBuilderDev · 2025-01-14T11:08:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dmlap · 2025-01-14T17:24:37Z

Great! I should be able to add one sometime this week

dmlap · 2025-01-22T13:01:07Z

Putting together a test case was helpful. It appears this condition is only triggered when a GGUF is loaded that is configured to offload a portion of model.state_dict() to disk. modeling_utils::_load_state_dict_into_meta_model() doesn't move the loaded state into modules mapped to disk. With the guard on archive_file from this patch, execution continues until from_pretrained() gets around to calling dispatch_model(), which promptly blows up trying to offload a meta tensor to disk.

I've worked around this by forcing the entire state_dict to get loaded in when gguf_path is specified – that gets the test working and produces the expected output. I'm not sure if this defeats the point of offloading some of the modules to disk (or if I'm missing something more fundamental).

Thoughts or suggestions? I can update the PR with what I have if seeing the necessary changes is easier.

SunMarc · 2025-01-22T14:28:01Z

After reflection, I think that we shouldn't allow offload with GGUF. This is because with gguf state_dict, we still have to modify the state dict to be compatible with transformers. So we can't really offload to disk.

dmlap · 2025-01-22T22:00:16Z

That makes sense. Do you think that should happen in transformers or accelerate? If it’s in the model loading here, I don’t mind taking a crack at it.

SunMarc · 2025-01-23T10:11:15Z

The check should be in transformers !

When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.

GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.

dmlap · 2025-02-01T05:39:30Z

@SunMarc I modified the handling of device_map so when ”auto” is specified for a GGUF file, it will attempt to remap disk offload back to the CPU. If disk is explicitly configured, a NotImplementedError will be raised.

There’s a test case for the explicit disk mapping but there didn’t seem to be a non-invasive way of testing the auto remapping, and it didn’t seem worth it to me. Let me know if you disagree or would like any further modifications.

dmlap requested review from Rocketknight1 and ArthurZucker as code owners January 11, 2025 17:00

SunMarc approved these changes Jan 13, 2025

View reviewed changes

Isotr0py approved these changes Jan 14, 2025

View reviewed changes

archive_file may not be specified

02145c5

When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.

dmlap force-pushed the no-archive-no-safetensors branch 2 times, most recently from 10c685b to e9326f3 Compare February 1, 2025 03:46

Remap partial disk offload to cpu for GGUF files

3c2066d

GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.

dmlap force-pushed the no-archive-no-safetensors branch from e9326f3 to 3c2066d Compare February 1, 2025 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard against unset resolved_archive_file #35628

Guard against unset resolved_archive_file #35628

dmlap commented Jan 11, 2025

SunMarc left a comment

Isotr0py left a comment

HuggingFaceDocBuilderDev commented Jan 14, 2025

dmlap commented Jan 14, 2025

dmlap commented Jan 22, 2025

SunMarc commented Jan 22, 2025

dmlap commented Jan 22, 2025

SunMarc commented Jan 23, 2025

dmlap commented Feb 1, 2025

Guard against unset resolved_archive_file #35628

Are you sure you want to change the base?

Guard against unset resolved_archive_file #35628

Conversation

dmlap commented Jan 11, 2025

What does this PR do?

SunMarc left a comment

Choose a reason for hiding this comment

Isotr0py left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 14, 2025

dmlap commented Jan 14, 2025

dmlap commented Jan 22, 2025

SunMarc commented Jan 22, 2025

dmlap commented Jan 22, 2025

SunMarc commented Jan 23, 2025

dmlap commented Feb 1, 2025