Replies: 7 comments 7 replies
-
@abetlen I would recommend to add these commands (or link to this guide, in the docs/readme. |
Beta Was this translation helpful? Give feedback.
-
I want to add that you may need to use set FORCE_CMAKE 1
set CMAKE_ARGS "-DLLAMA_CUBLAS=on -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off"
pip install --force-reinstall --no-cache-dir llama-cpp-python |
Beta Was this translation helpful? Give feedback.
-
Note: I wrote this guide for windows CMD (I just think it's simpler and more commonly used maybe) For powershell the commands would be as follows: $env:FORCE_CMAKE='1'; $env:CMAKE_ARGS='-DLLAMA_CUBLAS=on -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off'
pip install llama-cpp-python --no-cache-dir |
Beta Was this translation helpful? Give feedback.
-
This saved me from my frustration after many failed attempts. Why has the README not been amended though? |
Beta Was this translation helpful? Give feedback.
-
Thanks so much! I was trying to get this working for so long. |
Beta Was this translation helpful? Give feedback.
-
Note this is env. variable out-of-date; you will get this error if you try to use it during compilation:
|
Beta Was this translation helpful? Give feedback.
-
UPDATED COMMANDS
As of 2/2/2025, these are the updated commands. |
Beta Was this translation helpful? Give feedback.
-
So after a few frustrating weeks of not being able to successfully install with cublas support, I finally managed to piece it all together.
The commands to successfully install on windows (using cmd) are as follows:
If your hardware doesn't support AVX/AVX2 you HAVE to set the appropriate environment cmake arguments to off, otherwise the build fails!
You can remove
-DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off
(or set to on) if your hardware supports them.Important note: please also notice how the command to set the CMAKE_ARGS environment variable omits the double quotes (") ! this is critical..
The following (as mentioned in the docs) is actually incorrect in windows!
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
The correct way would be as follows:
set "CMAKE_ARGS=-DLLAMA_CUBLAS=on" && pip install llama-cpp-python
Notice how the quotes start before CMAKE_ARGS ! It's not a typo.. just windows cmd things.. you either do this or omit the quotes.
(If using powershell look here)
To get the latest version from github if you don't want to rely on pip versions (usually lags a few days behind, sometimes more).
Run the following:
It's unfortunate that the documentation doesn't mention this stuff properly and confuses linux/mac with windows commands, which are clearly not exactly the same, how is a newcomer supposed to figure this out?
Especially the AVX stuff are mentioned basically nowhere.. (massive thanks to @jllllll for his prebuilt wheels which led me to these cmake arguments).
Another thing that should be mentioned:
Prerequisites:
Visual Studio 2022 (community edition is enough)
CUDA Toolkit (I tested with versions 11.7 to 12.3, it all works)

You need at least these check boxes checked, I recommend unchecking the display driver if you have a newer nvidia driver already installed.
Beta Was this translation helpful? Give feedback.
All reactions