Fails to run for Mistral v0.2 #17

etomoscow · 2024-11-07T14:59:21Z

Hello.

I am trying to reproduce your method on Mistral v0.2 7B LLM.

Here's my code that I am trying to run:

python3 SVDLLM.py \
    --model mistralai/Mistral-7B-Instruct-v0.2\
    --step 1 \
    --ratio 0.1 \
    --whitening_nsamples 256 \
    --dataset wikitext2 \
    --seed 3 \
    --model_seq_len 2048 \
    --save_path .

However, it fails on the tokenizer loading step:

/path/to/env/lib/python3.12/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/env/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 768, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/env/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/env/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/env/lib/python3.12/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 124, in __init__
    super().__init__(
  File "/path/to/env/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 111, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3

I think the problem here is with the transformers version. However, as you mentioned in your README, the code is only compatible with 4.35.2.

Is there any workaround for this?

I am using:

transformers==4.35.2
tokenizers==0.15.2

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails to run for Mistral v0.2 #17

Fails to run for Mistral v0.2 #17

etomoscow commented Nov 7, 2024 •

edited

Loading

Fails to run for Mistral v0.2 #17

Fails to run for Mistral v0.2 #17

Comments

etomoscow commented Nov 7, 2024 • edited Loading

etomoscow commented Nov 7, 2024 •

edited

Loading