[GUIDE] How to ACTUALLY build with cuBLAS support on windows #871

tk-master · 2023-11-04T17:55:57Z

tk-master
Nov 4, 2023

So after a few frustrating weeks of not being able to successfully install with cublas support, I finally managed to piece it all together.

The commands to successfully install on windows (using cmd) are as follows:

set FORCE_CMAKE=1 && set CMAKE_ARGS=-DLLAMA_CUBLAS=on -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off
pip install llama-cpp-python --no-cache-dir

If your hardware doesn't support AVX/AVX2 you HAVE to set the appropriate environment cmake arguments to off, otherwise the build fails!
You can remove -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off (or set to on) if your hardware supports them.
Important note: please also notice how the command to set the CMAKE_ARGS environment variable omits the double quotes (") ! this is critical..

The following (as mentioned in the docs) is actually incorrect in windows!
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

The correct way would be as follows:
set "CMAKE_ARGS=-DLLAMA_CUBLAS=on" && pip install llama-cpp-python
Notice how the quotes start before CMAKE_ARGS ! It's not a typo.. just windows cmd things.. you either do this or omit the quotes.
(If using powershell look here)

To get the latest version from github if you don't want to rely on pip versions (usually lags a few days behind, sometimes more).
Run the following:

git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git llama-cpp-python-main
cd llama-cpp-python-main

set FORCE_CMAKE=1 && set "CMAKE_ARGS=-DLLAMA_CUBLAS=on -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off"
python -m pip install .[all]

It's unfortunate that the documentation doesn't mention this stuff properly and confuses linux/mac with windows commands, which are clearly not exactly the same, how is a newcomer supposed to figure this out?

Especially the AVX stuff are mentioned basically nowhere.. (massive thanks to @jllllll for his prebuilt wheels which led me to these cmake arguments).

Another thing that should be mentioned:

Prerequisites:

Visual Studio 2022 (community edition is enough)

CUDA Toolkit (I tested with versions 11.7 to 12.3, it all works)

You need at least these check boxes checked, I recommend unchecking the display driver if you have a newer nvidia driver already installed.

tk-master · 2023-11-04T18:01:28Z

tk-master
Nov 4, 2023
Author

@abetlen I would recommend to add these commands (or link to this guide, in the docs/readme.

0 replies

HenryQuan · 2023-11-14T12:30:27Z

HenryQuan
Nov 14, 2023

I want to add that you may need to use x64 Native Tools Command Prompt for VS 2022 or similar instead of Powershell on Windows. On Powershell, it didn't work at all due to probably some path issues (not sure). With the Command Prompt from VS 2022, you can do the following:

set FORCE_CMAKE 1
set CMAKE_ARGS "-DLLAMA_CUBLAS=on -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off"
pip install --force-reinstall --no-cache-dir llama-cpp-python

6 replies

HenryQuan Nov 14, 2023

I have updated my comment to set instead. I was following the guide from privateGPT as they used $env:CMAKE_ARGS, didn't know setx was a bit different.

Darrshan-Sankar Jul 6, 2024

for me the above didn't work as it is, it said CMake build errors, which i add below herewith:
<-------------------------------------------------------------------------------------------------------------------------------->
-- Detecting CXX compile features - done
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.38.1.windows.1")
CMake Error at vendor/llama.cpp/CMakeLists.txt:99 (message):
LLAMA_CUBLAS is deprecated and will be removed in the future.

    Use GGML_CUDA instead

  Call Stack (most recent call first):
    vendor/llama.cpp/CMakeLists.txt:104 (llama_option_depr)


  -- Configuring incomplete, errors occurred!

  *** CMake configuration failed
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama_cpp_python
Building wheel for paginate (setup.py) ... done

.......
Successfully built paginate
Failed to build llama_cpp_python
ERROR: Could not build wheels for llama_cpp_python, which is required to install pyproject.toml-based projects
<-------------------------------------------------------------------------------------------------------------------------------->
But the below command worked for me:
set CMAKE_ARGS="-DLLAMA_CUBLAS=on" && set FORCE_CMAKE=1
pip install llama-cpp-python[server]==0.2.23

Is this the right way to install or am I making anything wrong?

HenryQuan Jul 8, 2024

I believe you just need to replace DLLAMA_CUBLAS with DGGML_CUDA. It has became an error now which results in the failure. I haven't used this library for a while now, not sure if other flags are also deprecated.

llama_option_depr(FATAL_ERROR LLAMA_CUBLAS              GGML_CUDA)

Darrshan-Sankar Jul 8, 2024

Thanks

znorman-harris Jul 24, 2024

It took me about an hour to realize I needed to change that part of the environment variables. It's compiling now and will hopefully use the GPU.

tk-master · 2023-11-14T20:33:54Z

tk-master
Nov 14, 2023
Author

Note: I wrote this guide for windows CMD (I just think it's simpler and more commonly used maybe)

For powershell the commands would be as follows:

$env:FORCE_CMAKE='1'; $env:CMAKE_ARGS='-DLLAMA_CUBLAS=on -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off'
pip install llama-cpp-python --no-cache-dir

0 replies

rrrusst · 2024-01-24T06:02:21Z

rrrusst
Jan 24, 2024

The correct way would be as follows: set "CMAKE_ARGS=-DLLAMA_CUBLAS=on" && pip install llama-cpp-python Notice how the quotes start before CMAKE_ARGS ! It's not a typo.. just windows cmd things.. you either do this or omit the quotes. (If using powershell look here)

This saved me from my frustration after many failed attempts. Why has the README not been amended though?

0 replies

Logikschleifen · 2024-02-06T19:46:01Z

Logikschleifen
Feb 6, 2024

Thanks so much! I was trying to get this working for so long.

0 replies

mirekphd · 2024-07-05T16:27:17Z

mirekphd
Jul 5, 2024

Note this is env. variable out-of-date; you will get this error if you try to use it during compilation:

LLAMA_CUBLAS is removed. Use GGML_CUDA instead.
https://github.com/ggerganov/llama.cpp/blob/be20e7f49d9e5c6d9e8d9b4871eeba3df7a1639d/Makefile#L71-L72

1 reply

Darrshan-Sankar Jul 6, 2024

Yes, I too got the same error when trying to install in my error logs. Thanks. This will save many members confusion

RealUnrealGameDev · 2025-02-01T20:12:33Z

RealUnrealGameDev
Feb 1, 2025

UPDATED COMMANDS

set FORCE_CMAKE=1 && set CMAKE_ARGS=-DGGML_CUDA=on -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off

pip install llama-cpp-python --no-cache-dir

As of 2/2/2025, these are the updated commands.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GUIDE] How to ACTUALLY build with cuBLAS support on windows #871

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 7 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[GUIDE] How to ACTUALLY build with cuBLAS support on windows #871

The commands to successfully install on windows (using cmd) are as follows:

Prerequisites:

Replies: 7 comments · 7 replies

tk-master Nov 4, 2023 Author

tk-master Nov 14, 2023 Author

UPDATED COMMANDS

Replies: 7 comments 7 replies

tk-master
Nov 4, 2023
Author

tk-master
Nov 14, 2023
Author