Help with loading model #1499

Axel-At-Apollo · 2025-01-03T14:54:39Z

See #1410, has not been resolved as of yet.

KareemMusleh · 2025-01-03T21:08:51Z

I got your code to work like this

modeling_llama = importlib.reload(transformers.models.llama.modeling_llama)

model = modeling_llama.LlamaForCausalLM.from_pretrained(checkpoint_path)
tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint_path)

# Add text generation
prompt = "Write a short story about a robot learning to paint:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_length=200,
    num_return_sequences=1,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\nGenerated text:")
print(generated_text)

I will try to find a more general solution

Axel-At-Apollo · 2025-01-08T12:17:14Z

Thanks for this @KareemMusleh! This indeed works for the generation case. However when trying to use lm_eval (see first comment in #1410), I still get an error. Do you perhaps know what other parts of transformers I need to importlib.reload() to get that working?

Script:

import importlib

from lm_eval import simple_evaluate
from unsloth import FastLanguageModel

# Load with unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="./llama-3-2-1b-instruct",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=False,
)

# Save the model and tokenizer
checkpoint_path = "other_base_llama_3b_instruct"
model.save_pretrained(checkpoint_path)
tokenizer.save_pretrained(checkpoint_path)

# Reload the transformers library to remove unsloth patches
import transformers

importlib.reload(transformers)
importlib.reload(transformers.models.llama.modeling_llama)


# Evaluate the model
results = simple_evaluate(
    model="hf",
    model_args=f"pretrained={checkpoint_path}",
    tasks=["tinyMMLU"],
)

Terminal Output:

root@20ba2c8eeea4:/app# python unslo.py 
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.1.1: Fast Llama patching. Transformers: 4.46.2.
   \\   /|    GPU: NVIDIA A10G. Max memory: 21.975 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.0+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
2025-01-08:12:11:12,015 INFO     [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-01-08:12:11:12,015 INFO     [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'other_base_llama_3b_instruct'}
2025-01-08:12:11:12,056 INFO     [huggingface.py:132] Using device 'cuda'
2025-01-08:12:11:12,571 INFO     [huggingface.py:369] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'}
2025-01-08:12:11:22,748 INFO     [task.py:415] Building contexts for tinyMMLU on rank 0...
100%|██████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3227.23it/s]
2025-01-08:12:11:22,785 INFO     [evaluator.py:496] Running loglikelihood requests
Running loglikelihood requests:   0%|                                                      | 0/400 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/app/unslo.py", line 27, in <module>
    results = simple_evaluate(
              ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 401, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 303, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 401, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 507, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/api/model.py", line 378, in loglikelihood
    return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 1166, in _loglikelihood_tokens
    self._model_call(batched_inps, **call_kwargs), dim=-1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 856, in _model_call
    return self.model(inps).logits
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 986, in _CausalLM_fast_forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: LlamaModel.forward() got an unexpected keyword argument 'causal_mask'
Running loglikelihood requests:   0%|                                                      | 0/400 [00:00<?, ?it/s]

If I don't add the reload lines (i.e. importlib.reload(transformers) and importlib.reload(transformers.models.llama.modeling_llama)), but otherwise keep the script the same, I get a different error:

root@20ba2c8eeea4:/app# python unslo.py 
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.1.1: Fast Llama patching. Transformers: 4.46.2.
   \\   /|    GPU: NVIDIA A10G. Max memory: 21.975 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.0+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
2025-01-08:12:13:23,134 INFO     [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-01-08:12:13:23,134 INFO     [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'other_base_llama_3b_instruct'}
2025-01-08:12:13:23,172 INFO     [huggingface.py:132] Using device 'cuda'
2025-01-08:12:13:23,698 INFO     [huggingface.py:369] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'}
2025-01-08:12:13:33,457 INFO     [task.py:415] Building contexts for tinyMMLU on rank 0...
100%|██████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3211.98it/s]
2025-01-08:12:13:33,494 INFO     [evaluator.py:496] Running loglikelihood requests
Running loglikelihood requests:   0%|                                                      | 0/400 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/app/unslo.py", line 24, in <module>
    results = simple_evaluate(
              ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 401, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 303, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 401, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 507, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/api/model.py", line 378, in loglikelihood
    return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 1166, in _loglikelihood_tokens
    self._model_call(batched_inps, **call_kwargs), dim=-1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/huggingface.py", line 856, in _model_call
    return self.model(inps).logits
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 986, in _CausalLM_fast_forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 849, in LlamaModel_fast_forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 508, in LlamaDecoderLayer_fast_forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth/models/llama.py", line 371, in LlamaAttention_fast_forward
    Q, K, V = self.apply_qkv(self, hidden_states)
              ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1931, in __getattr__
    raise AttributeError(
AttributeError: 'LlamaSdpaAttention' object has no attribute 'apply_qkv'
Running loglikelihood requests:   0%|                                                      | 0/400 [00:01<?, ?it/s]

So it definitely has an effect. I guess this would work if I just knew all of the part of transformers that I had to reload.

danielhanchen · 2025-01-10T12:43:01Z

Apologies on the delay - if reloading does not work, I suggest using subprocess to load another Python process inside of Python - that should make it work.

The issue is Unsloth directly patches torch, transformers and many other libraries, so errors might be pervasive - sorry on the issue

KareemMusleh mentioned this issue Jan 4, 2025

Reload Transformers imports huggingface/transformers#35508

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with loading model #1499

Help with loading model #1499

Axel-At-Apollo commented Jan 3, 2025 •

edited

Loading

KareemMusleh commented Jan 3, 2025

Axel-At-Apollo commented Jan 8, 2025

danielhanchen commented Jan 10, 2025

Help with loading model #1499

Help with loading model #1499

Comments

Axel-At-Apollo commented Jan 3, 2025 • edited Loading

KareemMusleh commented Jan 3, 2025

Axel-At-Apollo commented Jan 8, 2025

danielhanchen commented Jan 10, 2025

Axel-At-Apollo commented Jan 3, 2025 •

edited

Loading