Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : fix more imatrix nan cases #11773

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Feb 9, 2025

Fixes #11764

The changes to other eps comparisons to check for the abs max value are not directly related to this issue, but I believe that was also incorrect.

@slaren
Copy link
Collaborator Author

slaren commented Feb 9, 2025

@bartowski1182 I only tested the 7B model, not sure if the issue with the 13B model is the same.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 9, 2025
@@ -384,7 +384,7 @@ static float make_qx_quants(int n, int nmax, const float * restrict x, int8_t *
float ax = fabsf(x[i]);
if (ax > amax) { amax = ax; max = x[i]; }
}
if (amax < GROUP_MAX_EPS) { // all zero
if (fabsf(amax) < GROUP_MAX_EPS) { // all zero
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we already use fabsf just above. How is this extra fabsf supposed to help?

@@ -3021,7 +3021,7 @@ static void quantize_row_iq2_xxs_impl(const float * restrict x, void * restrict
}
float max = xval[0];
for (int i = 1; i < 32; ++i) max = MAX(max, xval[i]);
if (max < GROUP_MAX_EPS) {
if (fabsf(max) < GROUP_MAX_EPS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xval contains the absolute values of the model weights, so how is this extra fabsf supposed to help?

@ikawrakow
Copy link
Contributor

Perhaps the following will help you to actually fix NaNs and asserts in imatrix-guided quantization:

Let's denote the model weights in a block with $x_i$ and their importance as defined by the imatrix with $w_i$. The condition that is required for things to work is

$$\sum w_i |x_i| > 0\quad\quad(1)$$

where the sum is over a quantization block. It doesn't matter how many GROUP_MAX_EPS one uses, or how many times one takes the absolute value of something that is already the absolute value, you will not avoid NaNs (or asserts) unless the above is satisfied. Speaking of asserts, this "fix" does avoid the assert, but does absolutely nothing to prevent a meaningless quantization.

To know what to do when (1) is not satisfied, one should first check if

$$\sum |x_i| > 0\quad\quad(2)$$

If (2) is not satisfied, we know that all model weights in the block are zero, so we can simply set all quants to zero and proceed with the next block.

If (2) is satisfied but (1) is not, it means that a) all imatrix values in the block are zero, or, the more tricky one, (b) non-zero imatrix values happen to coincide with zero model weights. In that case, the responsible thing to do would be to abort the quantization and tell the user to go fix their imatrix. But if this is considered inadequate for the many non-/semi-technical users of llama.cpp, the a ctual fix would be to ignore the imatrix in that block (it is bogus), and to set the importance of $x_i$ to something like $x_i^2$ as it was used in the pre-imatrix days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc. bug: Quantizing Olmo models with imatrix failing on some sizes
2 participants