Releases · ggerganov/llama.cpp

03 Feb 12:57

21c84b5

b4623 Latest

Latest

CUDA: fix Volta FlashAttention logic (#11615)

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-02-03T12:57:29Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-02-03T12:57:37Z
llama-b4623-bin-macos-arm64.zip

25.3 MB 2025-02-03T12:57:47Z
llama-b4623-bin-macos-x64.zip

27.1 MB 2025-02-03T12:57:48Z
llama-b4623-bin-ubuntu-x64.zip

29 MB 2025-02-03T12:57:49Z
llama-b4623-bin-win-avx-x64.zip

15.4 MB 2025-02-03T12:57:50Z
llama-b4623-bin-win-avx2-x64.zip

15.4 MB 2025-02-03T12:57:51Z
llama-b4623-bin-win-avx512-x64.zip

15.4 MB 2025-02-03T12:57:52Z
llama-b4623-bin-win-cuda-cu11.7-x64.zip

150 MB 2025-02-03T12:57:53Z
llama-b4623-bin-win-cuda-cu12.4-x64.zip

150 MB 2025-02-03T12:57:56Z
Source code (zip)

2025-02-03T12:25:56Z
Source code (tar.gz)

2025-02-03T12:25:56Z

02 Feb 23:22

github-actions

b4621

6eecde3

b4621

HIP: fix flash_attn_stream_k_fixup warning (#11604)

Assets 23

02 Feb 22:24

github-actions

b4620

396856b

b4620

CUDA/HIP: add support for selectable warp size to mmv (#11519)

CUDA/HIP: add support for selectable warp size to mmv

Assets 23

02 Feb 21:55

github-actions

b4619

4d0598e

b4619

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectu…

Assets 23

02 Feb 20:57

github-actions

b4618

90f9b88

b4618

nit: more informative crash when grammar sampler fails (#11593)

Assets 23

02 Feb 19:12

github-actions

b4617

864a0b6

b4617

CUDA: use mma PTX instructions for FlashAttention (#11583)

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <[email protected]>

Assets 23

02 Feb 15:56

github-actions

b4616

84ec8a5

b4616

Name colors (#11573)

It's more descriptive, use #define's so we can use compile-time
concatenations.

Signed-off-by: Eric Curtin <[email protected]>

Assets 23

02 Feb 10:26

github-actions

b4615

bfcce4d

b4615

`tool-call`: support Command R7B (+ return tool_plan "thoughts" in AP…

Assets 23

02 Feb 10:11

github-actions

b4614

6980448

b4614

Fix exotic ci env that lacks ostringstream::str (#11581)

Assets 23

02 Feb 08:51

github-actions

b4613

ff22770

b4613

sampling : support for llguidance grammars (#10224)

* initial porting of previous LLG patch

* update for new APIs

* build: integrate llguidance as an external project

* use '%llguidance' as marker to enable llg lark syntax

* add some docs

* clarify docs

* code style fixes

* remove llguidance.h from .gitignore

* fix tests when llg is enabled

* pass vocab not model to llama_sampler_init_llg()

* copy test-grammar-integration.cpp to test-llguidance.cpp

* clang fmt

* fix ref-count bug

* build and run test

* gbnf -> lark syntax

* conditionally include llguidance test based on LLAMA_LLGUIDANCE flag

* rename llguidance test file to test-grammar-llguidance.cpp

* add gh action for llg test

* align tests with LLG grammar syntax and JSON Schema spec

* llama_tokenizer() in fact requires valid utf8

* update llg

* format file

* add $LLGUIDANCE_LOG_LEVEL support

* fix whitespace

* fix warning

* include <cmath> for INFINITY

* add final newline

* fail llama_sampler_init_llg() at runtime

* Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes

* simplify #includes

* improve doc string for LLAMA_LLGUIDANCE

* typo in merge

* bump llguidance to 0.6.12

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4623

b4621

b4620

b4619

b4618

b4617

b4616

b4615

b4614

b4613