Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to run for Mistral v0.2 #17

Open
etomoscow opened this issue Nov 7, 2024 · 0 comments
Open

Fails to run for Mistral v0.2 #17

etomoscow opened this issue Nov 7, 2024 · 0 comments

Comments

@etomoscow
Copy link

etomoscow commented Nov 7, 2024

Hello.

I am trying to reproduce your method on Mistral v0.2 7B LLM.

Here's my code that I am trying to run:

python3 SVDLLM.py \
    --model mistralai/Mistral-7B-Instruct-v0.2\
    --step 1 \
    --ratio 0.1 \
    --whitening_nsamples 256 \
    --dataset wikitext2 \
    --seed 3 \
    --model_seq_len 2048 \
    --save_path .

However, it fails on the tokenizer loading step:

/path/to/env/lib/python3.12/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/env/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 768, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/env/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/env/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/env/lib/python3.12/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 124, in __init__
    super().__init__(
  File "/path/to/env/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 111, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3

I think the problem here is with the transformers version. However, as you mentioned in your README, the code is only compatible with 4.35.2.

Is there any workaround for this?

I am using:

transformers==4.35.2
tokenizers==0.15.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant