[Bugfix][Kernel] Fix moe align block issue for mixtral #12413

ElizaWszola · 2025-01-24T16:54:48Z

Fix an issue with shared arrays in moe_align_block_size_kernel that was causing Mixtral inference to crash.

Testing: run inference with
llm = LLM(model="TheBloke/Mixtral-8x7B-v0.1-GPTQ")

Signed-off-by: ElizaWszola <[email protected]>

github-actions · 2025-01-24T16:55:01Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mgoin · 2025-01-24T17:14:23Z

Could you please take a look @jinzhen-lin

csrc/moe/moe_align_sum_kernels.cu

Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: ElizaWszola <[email protected]>

csrc/moe/moe_align_sum_kernels.cu

Signed-off-by: ElizaWszola <[email protected]>

tlrmchlsmth

Thanks for the fix!

tlrmchlsmth · 2025-01-24T22:29:18Z

Basically we were only asking for num_experts + 1 elements in shared_memory for cumsum. But the offset for tokens_cnts in the kernel was as if cumsum needed max(num_experts, warp_size) + 1.

vllm/csrc/moe/moe_align_sum_kernels.cu

Lines 35 to 36 in 3132a93

    
           int32_t* cumsum = shared_mem;  // 1d tensor with shape (num_experts + 1) 
        
           token_cnts_t* tokens_cnts = (token_cnts_t*)(shared_mem + blockDim.x + 1);

vllm/csrc/moe/moe_align_sum_kernels.cu

Lines 229 to 230 in 3132a93

    
           const int32_t shared_mem_i32 = 
        
               ((num_thread + 1) * num_experts + (num_experts + 1)) * sizeof(int32_t);

So we were trying to use more shared memory than we asked for. Since we only need num_experts + 1 elements, we have this fix

mgoin

Thank you for the careful fix

Fix moe align block issue for mixtral

4c6ca52

Signed-off-by: ElizaWszola <[email protected]>

ElizaWszola mentioned this pull request Jan 24, 2025

Release v0.7.0 #12365

Open

8 tasks

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 24, 2025

tlrmchlsmth reviewed Jan 24, 2025

View reviewed changes

csrc/moe/moe_align_sum_kernels.cu Outdated Show resolved Hide resolved

tlrmchlsmth reviewed Jan 24, 2025

View reviewed changes

csrc/moe/moe_align_sum_kernels.cu Outdated Show resolved Hide resolved

Update csrc/moe/moe_align_sum_kernels.cu

e215ef6

Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: ElizaWszola <[email protected]>

ElizaWszola force-pushed the fix-mixtral-align-block branch from 349d986 to e215ef6 Compare January 24, 2025 20:33

tlrmchlsmth reviewed Jan 24, 2025

View reviewed changes

csrc/moe/moe_align_sum_kernels.cu Outdated Show resolved Hide resolved

make it simpler

70570d7

Signed-off-by: ElizaWszola <[email protected]>

tlrmchlsmth approved these changes Jan 24, 2025

View reviewed changes

simon-mo enabled auto-merge (squash) January 24, 2025 22:48

mgoin approved these changes Jan 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix][Kernel] Fix moe align block issue for mixtral #12413

[Bugfix][Kernel] Fix moe align block issue for mixtral #12413

ElizaWszola commented Jan 24, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 24, 2025

mgoin commented Jan 24, 2025

tlrmchlsmth left a comment

tlrmchlsmth commented Jan 24, 2025 •

edited

Loading

mgoin left a comment

[Bugfix][Kernel] Fix moe align block issue for mixtral #12413

Are you sure you want to change the base?

[Bugfix][Kernel] Fix moe align block issue for mixtral #12413

Conversation

ElizaWszola commented Jan 24, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 24, 2025

mgoin commented Jan 24, 2025

tlrmchlsmth left a comment

Choose a reason for hiding this comment

tlrmchlsmth commented Jan 24, 2025 • edited Loading

mgoin left a comment

Choose a reason for hiding this comment

ElizaWszola commented Jan 24, 2025 •

edited by github-actions bot

Loading

tlrmchlsmth commented Jan 24, 2025 •

edited

Loading