ggml : fix more imatrix nan cases #11773

slaren · 2025-02-09T17:18:23Z

The changes to other eps comparisons to check for the abs max value are not directly related to this issue, but I believe that was also incorrect.

slaren · 2025-02-09T17:18:52Z

@bartowski1182 I only tested the 7B model, not sure if the issue with the 13B model is the same.

ikawrakow · 2025-02-10T08:23:13Z

ggml/src/ggml-quants.c

@@ -384,7 +384,7 @@ static float make_qx_quants(int n, int nmax, const float * restrict x, int8_t *
        float ax = fabsf(x[i]);
        if (ax > amax) { amax = ax; max = x[i]; }
    }
-    if (amax < GROUP_MAX_EPS) { // all zero
+    if (fabsf(amax) < GROUP_MAX_EPS) { // all zero


Didn't we already use fabsf just above. How is this extra fabsf supposed to help?

ikawrakow · 2025-02-10T08:26:08Z

ggml/src/ggml-quants.c

@@ -3021,7 +3021,7 @@ static void quantize_row_iq2_xxs_impl(const float * restrict x, void * restrict
            }
            float max = xval[0];
            for (int i = 1; i < 32; ++i) max = MAX(max, xval[i]);
-            if (max < GROUP_MAX_EPS) {
+            if (fabsf(max) < GROUP_MAX_EPS) {


xval contains the absolute values of the model weights, so how is this extra fabsf supposed to help?

ikawrakow · 2025-02-10T08:45:23Z

Perhaps the following will help you to actually fix NaNs and asserts in imatrix-guided quantization:

Let's denote the model weights in a block with $x_i$ and their importance as defined by the imatrix with $w_i$. The condition that is required for things to work is

$$\sum w_i |x_i| > 0\quad\quad(1)$$

where the sum is over a quantization block. It doesn't matter how many GROUP_MAX_EPS one uses, or how many times one takes the absolute value of something that is already the absolute value, you will not avoid NaNs (or asserts) unless the above is satisfied. Speaking of asserts, this "fix" does avoid the assert, but does absolutely nothing to prevent a meaningless quantization.

To know what to do when (1) is not satisfied, one should first check if

$$\sum |x_i| > 0\quad\quad(2)$$

If (2) is not satisfied, we know that all model weights in the block are zero, so we can simply set all quants to zero and proceed with the next block.

If (2) is satisfied but (1) is not, it means that a) all imatrix values in the block are zero, or, the more tricky one, (b) non-zero imatrix values happen to coincide with zero model weights. In that case, the responsible thing to do would be to abort the quantization and tell the user to go fix their imatrix. But if this is considered inadequate for the many non-/semi-technical users of llama.cpp, the a ctual fix would be to ignore the imatrix in that block (it is bogus), and to set the importance of $x_i$ to something like $x_i^2$ as it was used in the pre-imatrix days.

ggml : fix more imatrix nan cases

cfb0ae7

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 9, 2025

ikawrakow reviewed Feb 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : fix more imatrix nan cases #11773

ggml : fix more imatrix nan cases #11773

slaren commented Feb 9, 2025

slaren commented Feb 9, 2025

ikawrakow Feb 10, 2025

ikawrakow Feb 10, 2025

ikawrakow commented Feb 10, 2025

ggml : fix more imatrix nan cases #11773

Are you sure you want to change the base?

ggml : fix more imatrix nan cases #11773

Conversation

slaren commented Feb 9, 2025

slaren commented Feb 9, 2025

ikawrakow Feb 10, 2025

Choose a reason for hiding this comment

ikawrakow Feb 10, 2025

Choose a reason for hiding this comment

ikawrakow commented Feb 10, 2025