[pull] master from ggerganov:master #32

pull · 2024-05-23T21:39:23Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* Add scaffolding for ggml logging macros * Metal backend now uses GGML logging * Cuda backend now uses GGML logging * Cann backend now uses GGML logging * Add enum tag to parameters * Use C memory allocation funcs * Fix compile error * Use GGML_LOG instead of GGML_PRINT * Rename llama_state to llama_logger_state * Prevent null format string * Fix whitespace * Remove log callbacks from ggml backends * Remove cuda log statement

ggml-ci

…ags (#10314)

fixes #10285

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses the B loads across the rows and also reuses some addressing calculations. This required manually partially unrolling the loop, since the compiler is less willing to unroll outer loops. Add bounds-checking on the last iteration of the loop. I think this was at least partly broken before. Optimize the Q4_K shader to vectorize most loads and reduce the number of bit twiddling instructions.

* Samplers sequence: simplified and input field. * Removed unused function * Modify and use `settings-modal-short-input` * rename "name" --> "label" --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

ggml-ci

#10352

* metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci

github-actions bot added testing devops python ggml labels May 23, 2024

pull bot added ⤵️ pull and removed testing devops python ggml labels May 23, 2024

github-actions bot added testing devops python ggml examples android build server SYCL Nvidia GPU Vulkan script Kompute documentation Improvements or additions to documentation labels May 24, 2024

github-actions bot added Apple Metal nix labels Jun 4, 2024

rgerganov and others added 5 commits October 3, 2024 13:00

rpc : enable vulkan (#9714)

841713e

closes #8536

convert : handle tokenizer merges format from transformers 4.45 (#9696)

e3c355b

ggml-backend : add device description to CPU backend (#9720)

a7ad553

metal : fix compute pass descriptor autorelease crash (#9718)

5d5ab1e

ggerganov and others added 29 commits November 15, 2024 15:44

cmake : fix ppc64 check (whisper/0)

09ecbcb

ggml-ci

ggml : fix some build issues

883d206

scripts: update compare-llama-bench.py (#10319)

4047be7

Make updates to fix issues with clang-cl builds while using AVX512 fl…

74d73dc

…ags (#10314)

llama : save number of parameters and the size in llama_model (#10286)

89e4caa

fixes #10285

ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324)

1e58ee1

vulkan : add cmake preset debug/release (#10306)

dd3a6ce

scripts : fix missing key in compare-llama-bench.py (#10332)

f245cc2

server: (web UI) Add samplers sequence customization (#10255)

bcdb7a2

* Samplers sequence: simplified and input field. * Removed unused function * Modify and use `settings-modal-short-input` * rename "name" --> "label" --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

make : auto-determine dependencies (#0)

8ee0d09

llamafile : fix include path (#0)

db4cfd5

ggml-ci

llama/ex: remove --logdir argument (#10339)

4e54be0

docs : vulkan build instructions to use git bash mingw64 (#10303)

0fff7fd

scripts : update sync

5c9a8b2

ggml: new optimization interface (ggml/988)

8a43e94

ggml : fix compile warnings (#0)

68fcb47

ggml-ci

tests : remove test-grad0

84274a1

make : add ggml-opt (#0)

a4200ca

ggml-ci

ggml : adapt AMX to tensor->grad removal (#0)

5d9e599

ggml-ci

ggml : inttypes.h -> cinttypes (#0)

24203e9

ggml-ci

ggml : fix possible buffer use after free in sched reserve (#9930)

eda7e1d

CMake: default to -arch=native for CUDA build (#10320)

467576b

CUDA: remove DMMV, consolidate F16 mult mat vec (#10318)

c3ea58a

ggml : fix undefined reference to 'getcpu' (#10354)

a431782

#10352

gitignore : ignore local run scripts [no ci]

20a780c

llama : only use default buffer types for the KV cache (#10358)

be5cacc

CMake: fix typo in comment [no ci] (#10360)

ce2e59b

pull bot merged commit ce2e59b into syther-labs:master Nov 17, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #32

[pull] master from ggerganov:master #32

pull bot commented May 23, 2024 •

edited

Loading

[pull] master from ggerganov:master #32

[pull] master from ggerganov:master #32

Conversation

pull bot commented May 23, 2024 • edited Loading

pull bot commented May 23, 2024 •

edited

Loading