Releases · OuadiElfarouki/llama.cpp

03 Apr 16:24

1ff4d9f

b2589

Add OpenChat, Alpaca, Vicuna chat templates (#6397)

* Add openchat chat template

* Add chat template test for openchat

* Add chat template for vicuna

* Add chat template for orca-vicuna

* Add EOS for vicuna templates

* Combine vicuna chat templates

* Add tests for openchat and vicuna chat templates

* Add chat template for alpaca

* Add separate template name for vicuna-orca

* Remove alpaca, match deepseek with jinja output

* Regenerate chat template test with add_generation_prompt

* Separate deepseek bos from system message

* Match openchat template with jinja output

* Remove BOS token from templates, unprefix openchat

Assets 18

03 Apr 11:43

github-actions

b2586

5260486

b2586

[SYCL] Disable iqx on windows as WA (#6435)

* disable iqx on windows as WA

* array instead of global_memory

Assets 18

02 Apr 12:34

github-actions

b2585

f87f7b8

b2585

flake.lock: Update (#6402)

Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/44d0940ea560dee511026a53f0e2e2cde489b4d4' (2024-03-23)
  → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Assets 18

28 Mar 16:21

github-actions

b2568

be55134

b2568

convert : refactor vocab selection logic (#6355)

Assets 18

27 Mar 13:31

github-actions

b2549

1e13987

b2549

embedding : show full embedding for single prompt (#6342)

* embedding : show full embedding for single prompt

To support the use case of creating an embedding for a given prompt, the entire embedding and not just the first part needed to be printed.

Also, show cosine similarity matrix only if there is more than one prompt, as the cosine similarity matrix for a single prompt is always `1.00`.

* Update examples/embedding/embedding.cpp

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 18

18 Mar 12:45

github-actions

b2456

ac9ee6a

b2456

ci : disable stale issue messages (#6126)

Assets 15

18 Mar 11:57

github-actions

b2454

2bf8d0f

b2454

backend : offload large batches to GPU (#6083)

* backend : offload large batches to GPU

* fix hip

* code cleanup

* fix CUDA split buffers

* Update ggml-backend-impl.h

Co-authored-by: Johannes Gäßler <[email protected]>

* cuda : fix memset without set_device

* imatrix : remove sched affix from weight names

* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup

* update backends

ggml-ci

---------

Co-authored-by: Johannes Gäßler <[email protected]>

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: OuadiElfarouki/llama.cpp

b2589

b2586

b2585

b2568

b2549

b2456

b2454