Skip to content

Releases: ggerganov/llama.cpp

b4543

24 Jan 12:32
8137b4b
Compare
Choose a tag to compare
CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380)

b4542

24 Jan 12:03
1af6945
Compare
Choose a tag to compare
cmake : avoid -march=native when reproducible build is wanted (#11366)

See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.

Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.

Fixes: #11317

This patch was done while working on reproducible builds for openSUSE.

b4539

23 Jan 21:34
564804b
Compare
Choose a tag to compare
tests: fix some mul_mat test gaps (#11375)

Now that we have batched mat-vec mul Vulkan shaders for up to n==8,
these tests weren't actually exercising the mat-mat mul path. Test
n==9 as well. Also, change to use all_types.

b4538

23 Jan 20:51
05f63cc
Compare
Choose a tag to compare
Update documentation (#11373)

To show -n, -ngl, --ngl is acceptable.

Signed-off-by: Eric Curtin <[email protected]>

b4537

23 Jan 17:04
f7fb43c
Compare
Choose a tag to compare
Add -ngl (#11372)

Most other llama.cpp cli tools accept -ngl with a single dash.

Signed-off-by: Eric Curtin <[email protected]>

b4536

23 Jan 13:49
5845661
Compare
Choose a tag to compare
server : add more clean up when cancel_tasks is called (#11340)

* server : add more clean up when cancel_tasks is called

* fix recv_with_timeout

* std::remove_if

* fix std::remove_if

b4535

23 Jan 11:20
f211d1d
Compare
Choose a tag to compare
Treat hf.co/ prefix the same as hf:// (#11350)

ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://

Treat them similarly.

Signed-off-by: Eric Curtin <[email protected]>

b4534

23 Jan 08:18
955a6c2
Compare
Choose a tag to compare
Vulkan-run-test: fix mmq_wg_denoms (#11343)

There should be a copy-and-paste error here.

*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms.

b4533

23 Jan 08:10
1971adf
Compare
Choose a tag to compare
vulkan: sort shaders for more deterministic binary (#11315)

Fixes #11306.

b4532

23 Jan 07:47
5245729
Compare
Choose a tag to compare
vulkan: fix diag_mask_inf (#11323)

With robustbufferaccess disabled, this shader was showing OOB stores. There
is a bounds check in the code, but the workgrouop dimensions were reversed vs
CUDA and it was running the wrong number of threads. So fix the workgroup
dimensions and disable robustness for this pipeline.