CUDA: fix mul_mat_vec for CC 6.0 #11775
Open
+5
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #10318 (comment) .
The problem is that by default the code is being compiled for compute capabilities 5.2, 6.1, 7.0, and 7.5. A GP100 has compute capability 6.0, the minimum for FP16 intrinsics. The host code says that it can do MMV with those intrinsics but without
GGML_CUDA_F16
there is no actual device code available. This PR is more of a band-aid fix that just makes GPUs with compute capability use FP32 arithmetic if the code was not compiled withGGML_CUDA_F16
. Medium-term I intend to revise the handling of these intrinsics and I'll do a proper fix at that time.