-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with loading model #1499
Comments
I got your code to work like this modeling_llama = importlib.reload(transformers.models.llama.modeling_llama)
model = modeling_llama.LlamaForCausalLM.from_pretrained(checkpoint_path)
tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint_path)
# Add text generation
prompt = "Write a short story about a robot learning to paint:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
inputs.input_ids,
max_length=200,
num_return_sequences=1,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\nGenerated text:")
print(generated_text) I will try to find a more general solution |
Thanks for this @KareemMusleh! This indeed works for the generation case. However when trying to use Script: import importlib
from lm_eval import simple_evaluate
from unsloth import FastLanguageModel
# Load with unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="./llama-3-2-1b-instruct",
max_seq_length=2048,
dtype=None,
load_in_4bit=False,
)
# Save the model and tokenizer
checkpoint_path = "other_base_llama_3b_instruct"
model.save_pretrained(checkpoint_path)
tokenizer.save_pretrained(checkpoint_path)
# Reload the transformers library to remove unsloth patches
import transformers
importlib.reload(transformers)
importlib.reload(transformers.models.llama.modeling_llama)
# Evaluate the model
results = simple_evaluate(
model="hf",
model_args=f"pretrained={checkpoint_path}",
tasks=["tinyMMLU"],
) Terminal Output: root@20ba2c8eeea4:/app# python unslo.py
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))== Unsloth 2025.1.1: Fast Llama patching. Transformers: 4.46.2.
\\ /| GPU: NVIDIA A10G. Max memory: 21.975 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.5.0+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.28.post2. FA2 = False]
"-____-" Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
2025-01-08:12:11:12,015 INFO [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-01-08:12:11:12,015 INFO [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'other_base_llama_3b_instruct'}
2025-01-08:12:11:12,056 INFO [huggingface.py:132] Using device 'cuda'
2025-01-08:12:11:12,571 INFO [huggingface.py:369] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'}
2025-01-08:12:11:22,748 INFO [task.py:415] Building contexts for tinyMMLU on rank 0...
100%|██████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3227.23it/s]
2025-01-08:12:11:22,785 INFO [evaluator.py:496] Running loglikelihood requests
Running loglikelihood requests: 0%| | 0/400 [00:00<?, ?it/s]Traceback (most recent call last):
File "/app/unslo.py", line 27, in <module>
results = simple_evaluate(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 401, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 303, in simple_evaluate
results = evaluate(
^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 401, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 507, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/api/model.py", line 378, in loglikelihood
return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 1166, in _loglikelihood_tokens
self._model_call(batched_inps, **call_kwargs), dim=-1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 856, in _model_call
return self.model(inps).logits
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 986, in _CausalLM_fast_forward
outputs = self.model(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: LlamaModel.forward() got an unexpected keyword argument 'causal_mask'
Running loglikelihood requests: 0%| | 0/400 [00:00<?, ?it/s] If I don't add the reload lines (i.e. root@20ba2c8eeea4:/app# python unslo.py
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))== Unsloth 2025.1.1: Fast Llama patching. Transformers: 4.46.2.
\\ /| GPU: NVIDIA A10G. Max memory: 21.975 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.5.0+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.28.post2. FA2 = False]
"-____-" Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
2025-01-08:12:13:23,134 INFO [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-01-08:12:13:23,134 INFO [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'other_base_llama_3b_instruct'}
2025-01-08:12:13:23,172 INFO [huggingface.py:132] Using device 'cuda'
2025-01-08:12:13:23,698 INFO [huggingface.py:369] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'}
2025-01-08:12:13:33,457 INFO [task.py:415] Building contexts for tinyMMLU on rank 0...
100%|██████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3211.98it/s]
2025-01-08:12:13:33,494 INFO [evaluator.py:496] Running loglikelihood requests
Running loglikelihood requests: 0%| | 0/400 [00:00<?, ?it/s]Traceback (most recent call last):
File "/app/unslo.py", line 24, in <module>
results = simple_evaluate(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 401, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 303, in simple_evaluate
results = evaluate(
^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 401, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 507, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/api/model.py", line 378, in loglikelihood
return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 1166, in _loglikelihood_tokens
self._model_call(batched_inps, **call_kwargs), dim=-1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 856, in _model_call
return self.model(inps).logits
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 986, in _CausalLM_fast_forward
outputs = self.model(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 849, in LlamaModel_fast_forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 508, in LlamaDecoderLayer_fast_forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 371, in LlamaAttention_fast_forward
Q, K, V = self.apply_qkv(self, hidden_states)
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1931, in __getattr__
raise AttributeError(
AttributeError: 'LlamaSdpaAttention' object has no attribute 'apply_qkv'
Running loglikelihood requests: 0%| | 0/400 [00:01<?, ?it/s] So it definitely has an effect. I guess this would work if I just knew all of the part of transformers that I had to reload. |
Apologies on the delay - if reloading does not work, I suggest using The issue is Unsloth directly patches torch, transformers and many other libraries, so errors might be pervasive - sorry on the issue |
See #1410, has not been resolved as of yet.
The text was updated successfully, but these errors were encountered: