CUDA: fix mul_mat_vec for CC 6.0 #11775

JohannesGaessler · 2025-02-09T18:35:01Z

The problem is that by default the code is being compiled for compute capabilities 5.2, 6.1, 7.0, and 7.5. A GP100 has compute capability 6.0, the minimum for FP16 intrinsics. The host code says that it can do MMV with those intrinsics but without GGML_CUDA_F16 there is no actual device code available. This PR is more of a band-aid fix that just makes GPUs with compute capability use FP32 arithmetic if the code was not compiled with GGML_CUDA_F16. Medium-term I intend to revise the handling of these intrinsics and I'll do a proper fix at that time.

slaren · 2025-02-09T18:39:20Z

Wouldn't be possible to check if the architecture is in __CUDA_ARCH_LIST__? Maybe we should do that more often instead of assuming that certain kernels are available.

slaren · 2025-02-09T19:00:27Z

ggml/src/ggml-cuda/mmv.cu

    const enum ggml_prec prec = fast_fp16_available(cc) ? ggml_prec(dst->op_params[0]) : GGML_PREC_F32;
+#else
+    // FIXME by default there is no code for CC 6.0 so trying to use FP16 intrinsics results in a crash
+    const enum ggml_prec prec = fast_fp16_available(cc) && cc != 600 ? ggml_prec(dst->op_params[0]) : GGML_PREC_F32;


Add to common code:

#ifdef __CUDA_ARCH_LIST__ constexpr bool ggml_cuda_has_arch_impl(int) { return false; } template<class ... Archs> constexpr bool ggml_cuda_has_arch_impl(int arch, int first, Archs... rest) { return arch == first || ggml_cuda_has_arch_impl(arch, rest...); } constexpr bool ggml_cuda_has_arch(int arch) { return ggml_cuda_has_arch_impl(arch, __CUDA_ARCH_LIST__); } #else constexpr bool ggml_cuda_has_arch(int) { return false; } #endif // __CUDA_ARCH_LIST__

Then:

Suggested change

const enum ggml_prec prec = fast_fp16_available(cc) && cc != 600 ? ggml_prec(dst->op_params[0]) : GGML_PREC_F32;

const enum ggml_prec prec = fast_fp16_available(cc) && (cc != 600 || ggml_cuda_has_arch(600)) ? ggml_prec(dst->op_params[0]) : GGML_PREC_F32;

This will keep the check at compile time, so it shouldn't add any overhead. Though I am sure you could come up with a more generic check.

CUDA: fix mul_mat_vec for CC 6.0

c80a441

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 9, 2025

JohannesGaessler mentioned this pull request Feb 9, 2025

CUDA: remove DMMV, consolidate F16 mult mat vec #10318

Merged

slaren reviewed Feb 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: fix mul_mat_vec for CC 6.0 #11775

CUDA: fix mul_mat_vec for CC 6.0 #11775

JohannesGaessler commented Feb 9, 2025

slaren commented Feb 9, 2025

slaren Feb 9, 2025

	const enum ggml_prec prec = fast_fp16_available(cc) && cc != 600 ? ggml_prec(dst->op_params[0]) : GGML_PREC_F32;
	const enum ggml_prec prec = fast_fp16_available(cc) && (cc != 600 \|\| ggml_cuda_has_arch(600)) ? ggml_prec(dst->op_params[0]) : GGML_PREC_F32;

CUDA: fix mul_mat_vec for CC 6.0 #11775

Are you sure you want to change the base?

CUDA: fix mul_mat_vec for CC 6.0 #11775

Conversation

JohannesGaessler commented Feb 9, 2025

slaren commented Feb 9, 2025

slaren Feb 9, 2025

Choose a reason for hiding this comment