Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Model State Reload with Quantized Stubs in SparseAutoModelForCausalLM #2226

Merged
merged 3 commits into from
Apr 5, 2024

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Apr 5, 2024

Description

Identified a bug in the main branch where the model state fails to reload when using quantized stubs with SparseAutoModelForCausalLM.from_pretrained(...). The issue was due to the reload_model_state method expecting weight files in a local directory, not accounting for remotely hosted model directories.

Solution

Propose downloading the model directory before invoking reload_model_state to ensure weight files are available locally for model state reload.

Testing

Tested with the following script, confirming the fix resolves the issue:

from sparseml.transformers import SparseAutoModelForCausalLM

model_path = "mgoin/llama2.c-stories15M-quant-pt"
m1 = SparseAutoModelForCausalLM.from_pretrained(model_path)

Observations

Before the Fix:

Model state fails to reload due to missing local weight files, as shown in warnings and errors in the logs.

..
..
2024-04-05 17:33:33 sparseml.core.recipe.recipe INFO     Loading recipe from file /home/rahul/.cache/huggingface/hub/models--mgoin--llama2.c-stories15M-quant-pt/snapshots/aa70fc9dc46615b68f935fb5405ae7875b88b716/recipe.yaml
manager stage: Model structure initialized
2024-04-05 17:33:34 sparseml.pytorch.model_load.helpers INFO     Applied an unstaged recipe to the model at mgoin/llama2.c-stories15M-quant-pt
2024-04-05 17:33:34 sparseml.pytorch.model_load.helpers WARNING  Model state was not reloaded for SparseML: could not find model weights for mgoin/llama2.c-stories15M-quant-pt

After the Fix:

..
..
..
2024-04-05 21:17:16 sparseml.pytorch.model_load.helpers INFO     Reloaded model state after SparseML recipe structure modifications from /nm/drive0/rahul/.cache/huggingface/hub/models--mgoin--llama2.c-stories15M-quant-pt/snapshots/aa70fc9dc46615b68f935fb5405ae7875b88b716

Successfully reloaded model state with the fix.


@rahul-tuli rahul-tuli self-assigned this Apr 5, 2024
@rahul-tuli rahul-tuli added the bug Something isn't working label Apr 5, 2024
@rahul-tuli rahul-tuli requested review from dsikka and horheynm April 5, 2024 18:46
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot_download can pull down a lot of extra files since it downloads the whole folder. This might interact weirdly with the resolve_recipe call with specifically tries to download the recipe. This works, but I would like to be a bit more selective with the download

bfineran
bfineran previously approved these changes Apr 5, 2024
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending @mgoin's comment

@rahul-tuli
Copy link
Member Author

The snapshot_download can pull down a lot of extra files since it downloads the whole folder. This might interact weirdly with the resolve_recipe call with specifically tries to download the recipe. This works, but I would like to be a bit more selective with the download

Addressed in latest commit @mgoin

mgoin
mgoin previously approved these changes Apr 5, 2024
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! It's additional complexity but I think good to have

@rahul-tuli
Copy link
Member Author

Thanks a lot! It's additional complexity but I think good to have

It was a great callout. Really appreciate it.

@mgoin mgoin merged commit 88196d5 into main Apr 5, 2024
13 of 15 checks passed
@mgoin mgoin deleted the hf-stub-bugfix branch April 5, 2024 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants