Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU/CUDA: fix GQA mul mat back, add CUDA support #11380

Merged
merged 1 commit into from
Jan 24, 2025

Conversation

JohannesGaessler
Copy link
Collaborator

On master the backward pass for matrix multiplication does not work correctly when the broadcasting for GQA is involved. However, this is not being detected because all of the relevant gradient tests are being skipped for speed. This PR fixes the backward pass and adds CUDA support. To make the backward pass work I am adding an extra parameter to ggml_repeat_back because the GQA broadcasting is different from e.g. the one in ggml_repeat.

This PR also adds minor fixes to other backward passes. After this PR it should not be necessary to make further changes to ggml ops for #10544 .

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 23, 2025
@ggerganov
Copy link
Owner

Can the adjacent logic be performed automatically without explicitly passing the argument to ggml_repeat_back(). Not 100% sure, but maybe checking if the repeat operation requires broadcast (i.e. nr1 > 1 || nr2 > 1) then use the adjacent == true branch? I could be missing something though.

@JohannesGaessler
Copy link
Collaborator Author

No, the problem is that the shape is the same but that different values need to be iterated over. Although now that I'm writing this I'm realizing that you could get the same result by interjecting a call to ggml_view and adding CUDA support for noncontiguous inputs. I'll do that instead.

@ggerganov
Copy link
Owner

Yup, sounds like a better alternative.

@JohannesGaessler
Copy link
Collaborator Author

I found and fixed another bug in the CUDA code for OUT_PROD related to dimension 1 not being contiguous.

@JohannesGaessler JohannesGaessler merged commit 8137b4b into ggerganov:master Jan 24, 2025
45 checks passed
anagri pushed a commit to BodhiSearch/llama.cpp that referenced this pull request Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants