-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: MiniMax-Text-01 model #11290
Comments
Very interested in this model! |
I have something more or less working here: https://github.com/fairydreaming/llama.cpp/tree/minimax-text-01 Some major remaining problems:
I tested it on CPU (AMD Epyc 9374F, Q5_K_M), some token generation performance values:
I used my custom llama-bench test for testing generation rate at a given prompt length. |
Yup, it's unfeasible to keep trying to fit all variants of the attention into the existing KV cache code. I am hoping that after the refactoring of #11213 , we will be able to implement custom attention mechanism for use cases like these. |
I noticed a problem with the model "eating" some words when asked to repeat text (Q5_K_M quant). Can someone with more RAM (like 512GB or 1TB) test this model with my branch? I'm not sure if the model is very sensitive to quantization or there is some other problem. The full prompt is:
while the model answer is:
There is one missing "of" in front of "human nature" and another "of" in front of "rest and health". Sometimes it eats "and" instead or both. A hungry model. I ran it with temp 0.01. I'm curious if it happens also on f16 or Q8_0 quantization. |
I have 1tb ram, I can try it |
I found about |
@fairydreaming tested your branch with Q5_K_M. On my setup I see some missing "of".
Q5_K_M:
|
That would be helpful, thanks. Regarding the command line I can't access the workstation now, will add that later. |
file format = GGUF V3 (latest) Full log:
Summary of Rounds and Missing WordsAcross the four rounds, the text provided by the user was analyzed for differences in word usage. Here's a concise summary of the missing words in each round and how they evolved: Round 1:Missing Words:
Round 2:Missing Words:
Round 3:
Round 4:
Summary of All Missing Words:From Rounds 1 and 2, the following words were missing:
In Rounds 3 and 4, no words were missing, indicating that the AI eventually reproduced the original text without errors. |
@fairydreaming I found a possible issue with that, need to reconvert model again. see u soon |
still the same issue, removed ignore_merges from llama-vocab.cpp and I did again a conversion and quant but no success. 'of' and |
@Nondzu OK, if it happens on Q8_0 then likely there's still some problem with my inference code as I didn't observe this behavior via API in OpenRouter. Thanks for testing! |
Prerequisites
Feature Description
Please add support for minimax-text-01 model https://huggingface.co/MiniMaxAI/MiniMax-Text-01
https://github.com/MiniMax-AI/MiniMax-01
Motivation
We need to add support for the latest models! Performs almost as good as deepseek v3. But has 4 million tokens context.
Possible Implementation
It's a MoE model.
The text was updated successfully, but these errors were encountered: