Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Mixture of Expert (MoE) Models #32

Open
AlexCheema opened this issue Jul 18, 2024 · 9 comments
Open

Support Mixture of Expert (MoE) Models #32

AlexCheema opened this issue Jul 18, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@AlexCheema
Copy link
Contributor

AlexCheema commented Jul 18, 2024

IMG_0084

@AlexCheema
Copy link
Contributor Author

IMG_0085

@AlexCheema AlexCheema added the enhancement New feature or request label Jul 18, 2024
@mzbac
Copy link
Contributor

mzbac commented Jul 19, 2024

I looked at this yesterday, would be great exo can support the deepseek v2, it should be very similar to the llama sharding in the DeepseekV2DecoderLayer. But maybe worth trying model parallelism -> ml-explore/mlx-examples#890

@345ishaan
Copy link

will like to work on this :)

@AlexCheema
Copy link
Contributor Author

will like to work on this :)

@345ishaan that would be great - go for it

@mintisan
Copy link

Indeed, MoE is the most suitable application scenario for exo and should be prioritized for implementation.
Really looking forward to it

@youmego
Copy link

youmego commented Aug 1, 2024

looking forward to support MoE deepseek v2 total:236B active:21B
+--------------------------+---------------+---------------------+-----------------+-------------------+
| Model | #Total Params | #Activated Params | Context Length | Download |
+--------------------------+---------------+---------------------+-----------------+-------------------+
| DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace |
| DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace |
+--------------------------+---------------+------------------ -+-----------------+------------------+

@345ishaan
Copy link

looking forward to support MoE deepseek v2 total:236B active:21B +--------------------------+---------------+---------------------+-----------------+-------------------+ | Model | #Total Params | #Activated Params | Context Length | Download | +--------------------------+---------------+---------------------+-----------------+-------------------+ | DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace | +--------------------------+---------------+------------------ -+-----------------+------------------+

yeah i was planning to experiment the setup with https://github.com/deepseek-ai/DeepSeek-Coder-V2 . Will be looking into it this weekend.

@ChaseKolozsy
Copy link

@AlexCheema

Is it possible to have the active parameters favor an Nvidia Cuda GPU and let the other nodes store the inactive parameters?

@AlexCheema
Copy link
Contributor Author

@AlexCheema

Is it possible to have the active parameters favor an Nvidia Cuda GPU and let the other nodes store the inactive parameters?

Yes! We can definitely do something like this.
We are moving towards more of a general distributed AI framework that will enable things like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants