Skip to content

Releases: OuadiElfarouki/llama.cpp

b2589

03 Apr 16:24
1ff4d9f
Compare
Choose a tag to compare
Add OpenChat, Alpaca, Vicuna chat templates (#6397)

* Add openchat chat template

* Add chat template test for openchat

* Add chat template for vicuna

* Add chat template for orca-vicuna

* Add EOS for vicuna templates

* Combine vicuna chat templates

* Add tests for openchat and vicuna chat templates

* Add chat template for alpaca

* Add separate template name for vicuna-orca

* Remove alpaca, match deepseek with jinja output

* Regenerate chat template test with add_generation_prompt

* Separate deepseek bos from system message

* Match openchat template with jinja output

* Remove BOS token from templates, unprefix openchat

b2586

03 Apr 11:43
5260486
Compare
Choose a tag to compare
[SYCL] Disable iqx on windows as WA (#6435)

* disable iqx on windows as WA

* array instead of global_memory

b2585

02 Apr 12:34
f87f7b8
Compare
Choose a tag to compare
flake.lock: Update (#6402)

Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/44d0940ea560dee511026a53f0e2e2cde489b4d4' (2024-03-23)
  → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

b2568

28 Mar 16:21
be55134
Compare
Choose a tag to compare
convert : refactor vocab selection logic (#6355)

b2549

27 Mar 13:31
1e13987
Compare
Choose a tag to compare
embedding : show full embedding for single prompt (#6342)

* embedding : show full embedding for single prompt

To support the use case of creating an embedding for a given prompt, the entire embedding and not just the first part needed to be printed.

Also, show cosine similarity matrix only if there is more than one prompt, as the cosine similarity matrix for a single prompt is always `1.00`.

* Update examples/embedding/embedding.cpp

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b2456

18 Mar 12:45
ac9ee6a
Compare
Choose a tag to compare
ci : disable stale issue messages (#6126)

b2454

18 Mar 11:57
2bf8d0f
Compare
Choose a tag to compare
backend : offload large batches to GPU (#6083)

* backend : offload large batches to GPU

* fix hip

* code cleanup

* fix CUDA split buffers

* Update ggml-backend-impl.h

Co-authored-by: Johannes Gäßler <[email protected]>

* cuda : fix memset without set_device

* imatrix : remove sched affix from weight names

* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup

* update backends

ggml-ci

---------

Co-authored-by: Johannes Gäßler <[email protected]>