-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix][Kernel] Fix moe align block issue for mixtral #12413
base: main
Are you sure you want to change the base?
[Bugfix][Kernel] Fix moe align block issue for mixtral #12413
Conversation
Signed-off-by: ElizaWszola <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Could you please take a look @jinzhen-lin |
Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: ElizaWszola <[email protected]>
349d986
to
e215ef6
Compare
Signed-off-by: ElizaWszola <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
Basically we were only asking for vllm/csrc/moe/moe_align_sum_kernels.cu Lines 35 to 36 in 3132a93
vllm/csrc/moe/moe_align_sum_kernels.cu Lines 229 to 230 in 3132a93
So we were trying to use more shared memory than we asked for. Since we only need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the careful fix
Fix an issue with shared arrays in
moe_align_block_size_kernel
that was causing Mixtral inference to crash.Testing: run inference with
llm = LLM(model="TheBloke/Mixtral-8x7B-v0.1-GPTQ")