-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenizer_class: LlamaTokenizerFast
becomes LlamaTokenizer
after load + immediate save
#35832
Comments
I don't understand why transformers/src/transformers/tokenization_utils_base.py Lines 2459 to 2461 in 373e50e
I think the code should be using |
Having a look, thanks! |
TBH this has not been touched in 4 years it seems. Main reason is that if you keep the non fast you can reload and does not really make a big difference |
Duplicating PR notes here: Some models like recently released Full Context: We found this bug after using GPTQModel to quantize various DeepSeek models and using EvalPlus to run benchmarks on it. EvalPlus always load slow tokenizer and pass EDIT: May we should just remove this |
System Info
I do not understand why but saving a loaded tokenizer changes the tokenizer class type. Unsure this is a usage error on my part of expected output from HF.
Who can help?
@ArthurZucker @itazap
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
tokenizer_config.json
save
"tokenizer_class": "LlamaTokenizerFast"
save
"tokenizer_class": "LlamaTokenizer"
Expected behavior
Tokenizer Class stays the same
The text was updated successfully, but these errors were encountered: