Add support for MiniMax-Text-01 and MiniMax-VL-01 from MiniMaxAI #35710

geetu040 · 2025-01-15T14:08:05Z

Model description

MiniMaxAI has just released two new models for text generation. While the code and weights have been made publicly available, the code requires significant formatting and cleaning to align with the standards of the Hugging Face Transformers library. The models are:

MiniMax-Text-01

MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model.

MiniMax-VL-01

It adopts the “ViT-MLP-LLM” framework, which is a commonly used technique in the field of multimodal large language models. The model is initialized and trained with three key parts: a 303-million-parameter Vision Transformer (ViT) for visual encoding, a randomly initialized two-layer MLP projector for image adaptation, and the MiniMax-Text-01 as the base LLM. MiniMax-VL-01 has a notable dynamic resolution feature. Input images are resized per a pre-set grid, with resolutions from 336×336 to 2016×2016, keeping a 336×336 thumbnail. The resized images are split into non-overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined for a full image representation. The training data for MiniMax-VL-01 consists of caption, description, and instruction data. The Vision Transformer (ViT) is trained on 694 million image-caption pairs from scratch. Across four distinct stages of the training pipeline, a total of 512 billion tokens are processed, leveraging this vast amount of data to endow the model with strong capabilities. Finally, MiniMax-VL-01 has reached top-level performance on multimodal leaderboards, demonstrating its edge and dependability in complex multimodal tasks.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Research Paper: https://arxiv.org/abs/2501.08313
Authors: MiniMax, Aonian Li, Bangwei Gong, et al.
Implementation
- MiniMaxAI/MiniMax-Text-01
- MiniMaxAI/MiniMax-VL-01
Models Weights
- MiniMaxAI/MiniMax-Text-01
- MiniMaxAI/MiniMax-VL-01

geetu040 · 2025-01-15T14:27:22Z

I would like to implement these models in transformers. But since these models are very large in size (456B parameters), I can only try to create smaller architechtures when developing and later try on other machines for testing the final outputs and consistency on full architechture. Does that sound possible or should I avoid this altogether?

ArthurZucker · 2025-01-16T14:59:36Z

It sounds good!
I think the best way is to :

create a dummy model from the original code (trust remote code = True)
save the weights, and generate logits with a sentence
create an equivalent model in transformers
make the logits match! 🚀

My recommendation is to have a look at Mixtral, SwitchTransformers and Llama in general!
Also https://huggingface.co/docs/transformers/en/modular_transformers

FYI @Rocketknight1

geetu040 · 2025-01-16T16:40:31Z

@ArthurZucker

create a dummy model from the original code (trust remote code = True)

this dummy model should be the minimal architecture with a very small size, by reducing the number of layers, attention heads, hidden_size e.t.c from the config, right?
because loading the full size model is going to be really difficult under normal resources.

Also https://huggingface.co/docs/transformers/en/modular_transformers

yes, I am planning to use the modular transformers, I hope most of the code can be reused.

Rocketknight1 · 2025-01-16T17:17:24Z

@geetu040 yes, the dummy model should be very small. It's okay for the model to be randomly initialized and to output garbage. What we want to check is that we get the same garbage with your implementation as we get with the original remote code implementation of the network.

Shakib-IO · 2025-01-17T19:01:32Z

Hi @geetu040,
I was wondering if you'd be interested in collaborating on implementing this model. I'm currently exploring the NLP domain and am eager to gain hands-on coding experience by building a model. Let me know what you think.

geetu040 · 2025-01-18T03:58:10Z

Hi @geetu040, I was wondering if you'd be interested in collaborating on implementing this model. I'm currently exploring the NLP domain and am eager to gain hands-on coding experience by building a model. Let me know what you think.

@Shakib-IO, sure I can use some help

geetu040 · 2025-01-22T06:51:56Z

@Shakib-IO, you wanted to help with the implementation. I have commented the future-work in the code, so that it can be easily tracked and worked on. I'll give you write access to this branch in my fork, where you can also push changes.

Shakib-IO · 2025-01-22T21:40:54Z

Thanks @geetu040

geetu040 added the New model label Jan 15, 2025

geetu040 linked a pull request Jan 22, 2025 that will close this issue

Add support for MiniMax's MiniMax-Text-01 #35831

Draft

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for MiniMax-Text-01 and MiniMax-VL-01 from MiniMaxAI #35710

Add support for MiniMax-Text-01 and MiniMax-VL-01 from MiniMaxAI #35710

geetu040 commented Jan 15, 2025 •

edited

Loading

geetu040 commented Jan 15, 2025

ArthurZucker commented Jan 16, 2025

geetu040 commented Jan 16, 2025

Rocketknight1 commented Jan 16, 2025

Shakib-IO commented Jan 17, 2025

geetu040 commented Jan 18, 2025

geetu040 commented Jan 22, 2025

Shakib-IO commented Jan 22, 2025

Add support for MiniMax-Text-01 and MiniMax-VL-01 from MiniMaxAI #35710

Add support for MiniMax-Text-01 and MiniMax-VL-01 from MiniMaxAI #35710

Comments

geetu040 commented Jan 15, 2025 • edited Loading

Model description

Open source status

Provide useful links for the implementation

geetu040 commented Jan 15, 2025

ArthurZucker commented Jan 16, 2025

geetu040 commented Jan 16, 2025

Rocketknight1 commented Jan 16, 2025

Shakib-IO commented Jan 17, 2025

geetu040 commented Jan 18, 2025

geetu040 commented Jan 22, 2025

Shakib-IO commented Jan 22, 2025

geetu040 commented Jan 15, 2025 •

edited

Loading