Fix visual encoders with no CLS #11982

alex-jw-brooks · 2025-02-20T19:44:09Z

This PR fixes the bug outlined in this issue: #10157

As well as discussed in projects leverage llama cpp like ollama: ollama/ollama#7441 ollama/ollama-python#433

Summary

In clip.cpp, we initialize a "patches" vector, which is then used to index into the embedding with a get rows op (here).

This can cause the out of bounds assertion to be triggered when run with the CPU backend when it's used with a visual encoder that has no CLS embedding, e.g., siglip. I.e.,

Siglip has an embedding dimension of 729 and no CLS
This causes the "patches" vector to get initialized with values [1, 2, ..., 729] instead of the correct [0, 1, ..., 728]
Because of the offset for the CLS that isn't there, the last value triggers the assertion

Steps to Verify

Download a model that doesn't have siglip as the visual encoder; I verified this fix with granite vision, which uses siglip, but you can also check it with nanollava.

Download GGUF files from ollama

wget https://registry.ollama.ai/v2/qnguyen3/nanollava/blobs/sha256:511ad0036913a93bd04aa1c08de98bcdfa15bcbe0e03e5e9e4334039531ba863 -O model.gguf

wget  https://registry.ollama.ai/v2/qnguyen3/nanollava/blobs/sha256:8a16a1e306eba4791488fd4f9585403ecb03da9b71d9f36e8944c33f35ca8754 -O projector.gguf

Build llava llama cli with cmake --build build --config Release --target llama-llava-cli
Try running the model.

MODEL_GGUF_PATH=/Users/alexanderjbrooks/Desktop/nanollava/model.gguf
PROJECT_GGUF_PATH=/Users/alexanderjbrooks/Desktop/nanollava/projector.gguf
IMG=~/Desktop/duck.jpg

./build/bin/llama-llava-cli -m $MODEL_GGUF_PATH \
    --mmproj $PROJECT_GGUF_PATH \
    --image $IMG \
    -p "<|im_start|>user<image>\nCan you describe this image?<|im_end|>\n<|im_start|>assistant" \
    --temp 0

On main, it blows up because of the patch 729:

/Users/alexanderjbrooks/workspace/develop/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:8518: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
zsh: abort      ./build/bin/llama-llava-cli -m $MODEL_GGUF_PATH --mmproj $PROJECT_GGUF_PATH

On this branch, things are happy:

This image features a beautiful outdoor scene with a clear blue sky and a variety of flowers. The sky is dotted with fluffy white clouds, and there are several trees with pink and white flowers. The image also includes a tall, thin tree with pink flowers and a tall tree with pink flowers. There are also some bushes with pink flowers. The image is rich in detail, with various elements such as the sky, clouds, trees, and flowers.

@ngxson @ggerganov @gabe-l-hart PTAL when you can - this change is needed to run granite vision models correctly as well (being added by this PR), but decoupling the bug fix from the new model support 🙂

Signed-off-by: Alex-Brooks <[email protected]>

github-actions bot added the examples label Feb 20, 2025

This was referenced Feb 20, 2025

Error: unknown error was encountered while running the model GGML_ASSERT(i01 >= 0 && i01 < ne01) failed ollama/ollama#7441

Open

Add Granite Vision Support #11794

Open

Fix visual encoders with no CLS

192afbc

Signed-off-by: Alex-Brooks <[email protected]>

alex-jw-brooks force-pushed the fix_no_cls_vencoders branch from b846817 to 192afbc Compare February 20, 2025 21:14

ggerganov approved these changes Feb 21, 2025

View reviewed changes

ggerganov merged commit ee02ad0 into ggml-org:master Feb 21, 2025
1 check passed

alex-jw-brooks mentioned this pull request Feb 21, 2025

Add patch for granite vision support ollama/ollama#9071

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix visual encoders with no CLS #11982

Fix visual encoders with no CLS #11982

alex-jw-brooks commented Feb 20, 2025 •

edited

Loading

Fix visual encoders with no CLS #11982

Fix visual encoders with no CLS #11982

Conversation

alex-jw-brooks commented Feb 20, 2025 • edited Loading

Summary

Steps to Verify

alex-jw-brooks commented Feb 20, 2025 •

edited

Loading