Support Mixture of Expert (MoE) Models #32

AlexCheema · 2024-07-18T22:24:59Z

AlexCheema · 2024-07-18T22:27:45Z

mzbac · 2024-07-19T00:52:40Z

I looked at this yesterday, would be great exo can support the deepseek v2, it should be very similar to the llama sharding in the DeepseekV2DecoderLayer. But maybe worth trying model parallelism -> ml-explore/mlx-examples#890

345ishaan · 2024-07-19T09:24:46Z

will like to work on this :)

AlexCheema · 2024-07-21T01:08:54Z

will like to work on this :)

@345ishaan that would be great - go for it

mintisan · 2024-07-23T01:46:56Z

Indeed, MoE is the most suitable application scenario for exo and should be prioritized for implementation.
Really looking forward to it

youmego · 2024-08-01T11:07:12Z

looking forward to support MoE deepseek v2 total:236B active:21B
+--------------------------+---------------+---------------------+-----------------+-------------------+
| Model | #Total Params | #Activated Params | Context Length | Download |
+--------------------------+---------------+---------------------+-----------------+-------------------+
| DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace |
| DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace |
+--------------------------+---------------+------------------ -+-----------------+------------------+

345ishaan · 2024-08-02T03:05:28Z

looking forward to support MoE deepseek v2 total:236B active:21B +--------------------------+---------------+---------------------+-----------------+-------------------+ | Model | #Total Params | #Activated Params | Context Length | Download | +--------------------------+---------------+---------------------+-----------------+-------------------+ | DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace | +--------------------------+---------------+------------------ -+-----------------+------------------+

yeah i was planning to experiment the setup with https://github.com/deepseek-ai/DeepSeek-Coder-V2 . Will be looking into it this weekend.

ChaseKolozsy · 2024-12-26T07:42:30Z

@AlexCheema

Is it possible to have the active parameters favor an Nvidia Cuda GPU and let the other nodes store the inactive parameters?

AlexCheema · 2024-12-26T17:27:22Z

@AlexCheema

Is it possible to have the active parameters favor an Nvidia Cuda GPU and let the other nodes store the inactive parameters?

Yes! We can definitely do something like this.
We are moving towards more of a general distributed AI framework that will enable things like this.

AlexCheema added the enhancement New feature or request label Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Mixture of Expert (MoE) Models #32

Support Mixture of Expert (MoE) Models #32

AlexCheema commented Jul 18, 2024 •

edited

Loading

AlexCheema commented Jul 18, 2024

mzbac commented Jul 19, 2024 •

edited

Loading

345ishaan commented Jul 19, 2024

AlexCheema commented Jul 21, 2024

mintisan commented Jul 23, 2024

youmego commented Aug 1, 2024 •

edited

Loading

345ishaan commented Aug 2, 2024

ChaseKolozsy commented Dec 26, 2024

AlexCheema commented Dec 26, 2024

Support Mixture of Expert (MoE) Models #32

Support Mixture of Expert (MoE) Models #32

Comments

AlexCheema commented Jul 18, 2024 • edited Loading

AlexCheema commented Jul 18, 2024

mzbac commented Jul 19, 2024 • edited Loading

345ishaan commented Jul 19, 2024

AlexCheema commented Jul 21, 2024

mintisan commented Jul 23, 2024

youmego commented Aug 1, 2024 • edited Loading

345ishaan commented Aug 2, 2024

ChaseKolozsy commented Dec 26, 2024

AlexCheema commented Dec 26, 2024

AlexCheema commented Jul 18, 2024 •

edited

Loading

mzbac commented Jul 19, 2024 •

edited

Loading

youmego commented Aug 1, 2024 •

edited

Loading