Error (wrong size) Running 70B Llama 2 gguf model on M2 #663

JRZS · 2023-09-02T22:35:50Z

JRZS
Sep 2, 2023

Hi I have a Mac with an M2. I installed the llama-cpp-python with the latest version installed (0.1.83) using:
'CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir'

When I run against a gguf formatted llama 2 model, using:
export N_GQA=8 && python3 -m llama_cpp.server --model $MODEL --n_gpu_layers 38

I get the error:
error loading model: create_tensor: tensor 'blk.0.attn_k.weight' has wrong shape; expected 8192, 8192, got 8192, 1024, 1, 1 llama_load_model_from_file: failed to load model

Anyone know how to resolve this?

JRZS · 2023-09-06T19:20:10Z

JRZS
Sep 6, 2023
Author

I found the issue here, it was a problem with the conversion to GGUF that caused this issue

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error (wrong size) Running 70B Llama 2 gguf model on M2 #663

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Error (wrong size) Running 70B Llama 2 gguf model on M2 #663

JRZS Sep 2, 2023

Replies: 1 comment

JRZS Sep 6, 2023 Author

JRZS
Sep 2, 2023

JRZS
Sep 6, 2023
Author