Skip to content

Releases: ggerganov/llama.cpp

b4623

03 Feb 12:57
21c84b5
Compare
Choose a tag to compare
CUDA: fix Volta FlashAttention logic (#11615)

b4621

02 Feb 23:22
6eecde3
Compare
Choose a tag to compare
HIP: fix flash_attn_stream_k_fixup warning (#11604)

b4620

02 Feb 22:24
396856b
Compare
Choose a tag to compare
CUDA/HIP: add support for selectable warp size to mmv (#11519)

CUDA/HIP: add support for selectable warp size to mmv

b4619

02 Feb 21:55
4d0598e
Compare
Choose a tag to compare
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectu…

b4618

02 Feb 20:57
90f9b88
Compare
Choose a tag to compare
nit: more informative crash when grammar sampler fails (#11593)

b4617

02 Feb 19:12
864a0b6
Compare
Choose a tag to compare
CUDA: use mma PTX instructions for FlashAttention (#11583)

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <[email protected]>

b4616

02 Feb 15:56
84ec8a5
Compare
Choose a tag to compare
Name colors (#11573)

It's more descriptive, use #define's so we can use compile-time
concatenations.

Signed-off-by: Eric Curtin <[email protected]>

b4615

02 Feb 10:26
bfcce4d
Compare
Choose a tag to compare
`tool-call`: support Command R7B (+ return tool_plan "thoughts" in AP…

b4614

02 Feb 10:11
6980448
Compare
Choose a tag to compare
Fix exotic ci env that lacks ostringstream::str (#11581)

b4613

02 Feb 08:51
ff22770
Compare
Choose a tag to compare
sampling : support for llguidance grammars (#10224)

* initial porting of previous LLG patch

* update for new APIs

* build: integrate llguidance as an external project

* use '%llguidance' as marker to enable llg lark syntax

* add some docs

* clarify docs

* code style fixes

* remove llguidance.h from .gitignore

* fix tests when llg is enabled

* pass vocab not model to llama_sampler_init_llg()

* copy test-grammar-integration.cpp to test-llguidance.cpp

* clang fmt

* fix ref-count bug

* build and run test

* gbnf -> lark syntax

* conditionally include llguidance test based on LLAMA_LLGUIDANCE flag

* rename llguidance test file to test-grammar-llguidance.cpp

* add gh action for llg test

* align tests with LLG grammar syntax and JSON Schema spec

* llama_tokenizer() in fact requires valid utf8

* update llg

* format file

* add $LLGUIDANCE_LOG_LEVEL support

* fix whitespace

* fix warning

* include <cmath> for INFINITY

* add final newline

* fail llama_sampler_init_llg() at runtime

* Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes

* simplify #includes

* improve doc string for LLAMA_LLGUIDANCE

* typo in merge

* bump llguidance to 0.6.12