Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b4543
b4542
cmake : avoid -march=native when reproducible build is wanted (#11366) See https://reproducible-builds.org/ for why this is good and https://reproducible-builds.org/specs/source-date-epoch/ for the definition of this variable. Without this patch, compiling on different machines produced different binaries, which made verification of results difficult. Fixes: #11317 This patch was done while working on reproducible builds for openSUSE.
b4539
tests: fix some mul_mat test gaps (#11375) Now that we have batched mat-vec mul Vulkan shaders for up to n==8, these tests weren't actually exercising the mat-mat mul path. Test n==9 as well. Also, change to use all_types.
b4538
Update documentation (#11373) To show -n, -ngl, --ngl is acceptable. Signed-off-by: Eric Curtin <[email protected]>
b4537
Add -ngl (#11372) Most other llama.cpp cli tools accept -ngl with a single dash. Signed-off-by: Eric Curtin <[email protected]>
b4536
server : add more clean up when cancel_tasks is called (#11340) * server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if
b4535
Treat hf.co/ prefix the same as hf:// (#11350) ollama uses hf.co/ to specify huggingface prefix, like RamaLama uses hf:// Treat them similarly. Signed-off-by: Eric Curtin <[email protected]>
b4534
Vulkan-run-test: fix mmq_wg_denoms (#11343) There should be a copy-and-paste error here. *mmq_wg_denoms should be used together with *warptile_mmq, instead of wg_denoms.
b4533
vulkan: sort shaders for more deterministic binary (#11315) Fixes #11306.
b4532
vulkan: fix diag_mask_inf (#11323) With robustbufferaccess disabled, this shader was showing OOB stores. There is a bounds check in the code, but the workgrouop dimensions were reversed vs CUDA and it was running the wrong number of threads. So fix the workgroup dimensions and disable robustness for this pipeline.