Releases · ggerganov/llama.cpp

24 Jan 12:32

8137b4b

b4543 Latest

Latest

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380)

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-01-24T12:32:42Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-01-24T12:32:50Z
llama-b4543-bin-macos-arm64.zip

19.9 MB 2025-01-24T12:32:59Z
llama-b4543-bin-macos-x64.zip

21.3 MB 2025-01-24T12:33:00Z
llama-b4543-bin-ubuntu-x64.zip

23.2 MB 2025-01-24T12:33:01Z
llama-b4543-bin-win-avx-x64.zip

13.8 MB 2025-01-24T12:33:02Z
llama-b4543-bin-win-avx2-x64.zip

13.9 MB 2025-01-24T12:33:02Z
llama-b4543-bin-win-avx512-x64.zip

13.9 MB 2025-01-24T12:33:03Z
llama-b4543-bin-win-cuda-cu11.7-x64.zip

152 MB 2025-01-24T12:33:04Z
llama-b4543-bin-win-cuda-cu12.4-x64.zip

151 MB 2025-01-24T12:33:08Z
Source code (zip)

2025-01-24T11:38:31Z
Source code (tar.gz)

2025-01-24T11:38:31Z

24 Jan 12:03

github-actions

b4542

1af6945

b4542

cmake : avoid -march=native when reproducible build is wanted (#11366)

See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.

Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.

Fixes: #11317

This patch was done while working on reproducible builds for openSUSE.

Assets 23

23 Jan 21:34

github-actions

b4539

564804b

b4539

tests: fix some mul_mat test gaps (#11375)

Now that we have batched mat-vec mul Vulkan shaders for up to n==8,
these tests weren't actually exercising the mat-mat mul path. Test
n==9 as well. Also, change to use all_types.

Assets 23

23 Jan 20:51

github-actions

b4538

05f63cc

b4538

Update documentation (#11373)

To show -n, -ngl, --ngl is acceptable.

Signed-off-by: Eric Curtin <[email protected]>

Assets 23

23 Jan 17:04

github-actions

b4537

f7fb43c

b4537

Add -ngl (#11372)

Most other llama.cpp cli tools accept -ngl with a single dash.

Signed-off-by: Eric Curtin <[email protected]>

Assets 23

23 Jan 13:49

github-actions

b4536

5845661

b4536

server : add more clean up when cancel_tasks is called (#11340)

* server : add more clean up when cancel_tasks is called

* fix recv_with_timeout

* std::remove_if

* fix std::remove_if

Assets 23

23 Jan 11:20

github-actions

b4535

f211d1d

b4535

Treat hf.co/ prefix the same as hf:// (#11350)

ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://

Treat them similarly.

Signed-off-by: Eric Curtin <[email protected]>

Assets 23

23 Jan 08:18

github-actions

b4534

955a6c2

b4534

Vulkan-run-test: fix mmq_wg_denoms (#11343)

There should be a copy-and-paste error here.

*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms.

Assets 23

23 Jan 08:10

github-actions

b4533

1971adf

b4533

vulkan: sort shaders for more deterministic binary (#11315)

Fixes #11306.

Assets 23

23 Jan 07:47

github-actions

b4532

5245729

b4532

vulkan: fix diag_mask_inf (#11323)

With robustbufferaccess disabled, this shader was showing OOB stores. There
is a bounds check in the code, but the workgrouop dimensions were reversed vs
CUDA and it was running the wrong number of threads. So fix the workgroup
dimensions and disable robustness for this pipeline.

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4543

b4542

b4539

b4538

b4537

b4536

b4535

b4534

b4533

b4532