Merge pull request #12 from rahuldshetty/gguf_rework

Initial Release of 2.0.0 version
rahuldshetty · Jun 26, 2024 · 1f179ac · 1f179ac
2 parents 3d06d77 + 69a8d8d
commit 1f179ac
Show file tree

Hide file tree

Showing 33 changed files with 978 additions and 2,118 deletions.
diff --git a/.gitignore b/.gitignore
@@ -35,4 +35,5 @@ zig-cache/
 node_modules/
 dist/
 releases/
-examples/
+examples/
+package-lock.json
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,11 +4,17 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [1.0.3] (alpha)
+## [2.0.0]
 
-### Added
+### Changes
+
+- Introducing new model type: GGUF_CPU to support running gguf compiled models on CPU with LLM.js
+- Bumped up Llama.cpp worker to build commit [dd047b4](https://github.com/ggerganov/llama.cpp/tree/dd047b476c8b904e0c25e5dbc5bee6ffde2f6e17)
+- Model Caching: Persisting model files in WebWorker's cache to avoid re-downloading model on every load. Thanks to [PR](https://github.com/rahuldshetty/llm.js/pull/3) from [@felladrin](https://github.com/felladrin).
+
+### Removes
 
-- Model Caching: Persisting model files in WebWorker's cache to avoid re-downloading model on every load. Thanks to [PR](https://github.com/rahuldshetty/llm.js/pull/3) from [@felladrin](https://github.com/felladrin). 
+- Deprecating model types: LLAMA2, DOLLY_V2, GPT_2, GPT_J, GPT_NEO_X, MPT, REPLIT, STARCODER
 
 ## [1.0.2]
 

diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 <div align="center" style="display:flex; align-items:center;justify-content: center;background:#e1e1e1;color:#0f0f0f;padding:50px;">
-    <img alt="llm.js logo" src="https://raw.githubusercontent.com/rahuldshetty/llm.js/master/docs/_media/logo.png" width="350">
+    <img alt="llm.js logo" src="https://raw.githubusercontent.com/rahuldshetty/llm.js/master/docs/_media/logo.jpg">
 </div>
 
 <p align="center">
@@ -16,22 +16,28 @@
     <img alt="Sample" src="https://raw.githubusercontent.com/rahuldshetty/llm.js/master/docs/_media/demo.gif">
 </p>
 
-Example projects🌐✨: [Live Demo](https://rahuldshetty.github.io/ggml.js-examples/)
+Example projects🌐✨: [Live Demo](https://rahuldshetty.github.io/llm.js-examples/)
 
 Learn More: [Documentation](https://rahuldshetty.github.io/llm.js/) 
 
 Models Supported:
-- [llama-cpp (GGUF/GGML)](https://github.com/ggerganov/llama.cpp)
-- [LLaMa 2](https://github.com/karpathy/llama2.c)
-- [Dolly v2](https://github.com/ggerganov/ggml/tree/master/examples/dolly-v2) 
-- [GPT2](https://github.com/ggerganov/ggml/tree/master/examples/gpt-2)
-- [GPT J](https://github.com/ggerganov/ggml/tree/master/examples/gpt-j)
-- [GPT NEO X](https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox)
-- [MPT](https://github.com/ggerganov/ggml/tree/master/examples/mpt)
-- [Replit](https://github.com/ggerganov/ggml/tree/master/examples/replit)
-- [StarCoder](https://github.com/ggerganov/ggml/tree/master/examples/starcoder)
-
-*New models/formats coming soon*⏰
+-  [TinyLLaMA Series - 1,2,3🦙](https://huggingface.co/TinyLlama)
+-  [GPT-2](https://huggingface.co/gpt2)
+-  [Tiny Mistral Series](https://huggingface.co/Locutusque/TinyMistral-248M)
+-  [Tiny StarCoder Py](https://huggingface.co/bigcode/tiny_starcoder_py)
+-  [Qwen Models](https://huggingface.co/Qwen)
+-  [TinySolar](https://huggingface.co/upstage/TinySolar-248m-4k-code-instruct)
+-  [Pythia](https://github.com/EleutherAI/pythia)
+-  [Mamba](https://huggingface.co/state-spaces/mamba-130m-hf)
+and much more✨ 
+
+## Features
+
+- Run inference directly on browser (even on smartphones)
+- Developed in pure JavaScript
+- Web Worker to perform background tasks (model downloading/inference)
+- Model Caching support
+- Pre-built [packages](https://github.com/rahuldshetty/llm.js/releases) to directly plug-and-play into your web apps.
 
 ## Installation
 
@@ -59,10 +65,10 @@ const run_complete = () => {}
 // Configure LLM app
 const app = new LLM(
      // Type of Model
-    'STARCODER',    
+    'GGUF_CPU',
 
     // Model URL
-    'https://huggingface.co/rahuldshetty/ggml.js/resolve/main/starcoder.bin', 
+    'https://huggingface.co/RichardErkhov/bigcode_-_tiny_starcoder_py-gguf/resolve/main/tiny_starcoder_py.Q8_0.gguf',
 
     // Model Load callback function
     on_loaded,          

diff --git a/build-ggml.sh b/build-ggml.sh
diff --git a/build-llama2-c.sh b/build-llama2-c.sh
diff --git a/debug.log b/debug.log
diff --git a/docs/BUILD.md b/docs/BUILD.md
@@ -14,25 +14,13 @@ In order to build llm.js, you need to have the following pre-requisites 🛠️
     cd llm.js
     ```
 
-2) Run the ggml build script to generate WASM files:
+2) Run the llama build script to generate WASM files:
     ```
-    ./build-ggml.sh
+    sh scripts/build-llama-cpp-wasm.sh
     ```
 
-This script will download the ggml repository and apply llm.js patches to generate WebAssembly bundles for GGML models.
-You can find the WebAssembly JS files in `build/bin/bin/` location.
-
-## Build WebAssembly bundles for Tiny-LLAMA2 models
-
-1) Run the llama2 build script to generate WASM files:
-    ```
-    ./build-llama2-c.sh
-
-## Build WebAssembly bundles for LLAMA models
-
-1) Run the llama build script to generate WASM files:
-    ```
-    ./build-llama-cpp.sh
+This script will download the llama.cpp repository and apply llm.js patches to generate WebAssembly bundles.
+You can find the WebAssembly JS files in `build/llama-bin/bin/` location.
 
 ## Package llm.js
 

diff --git a/docs/README.md b/docs/README.md
@@ -2,19 +2,26 @@
 
 Run Large-Language Models (LLMs) 🚀 directly in your browser!
 
-LLM.js provides JavaScript bindings for interacting with quantized large language models (GGUF/GGML/tiny-llama2).
+LLM.js provides JavaScript bindings for interacting with quantized large language models (GGUF).
 
-Example projects🌐✨: [Live Demo](https://rahuldshetty.github.io/ggml.js-examples/)
+Example projects🌐✨: [Live Demo](https://rahuldshetty.github.io/llm.js-examples/)
 
 Models Supported:
-- [llama-cpp (GGUF/GGML)](https://github.com/ggerganov/llama.cpp)
-- [LLaMa 2](https://github.com/karpathy/llama2.c)
-- [Dolly v2](https://github.com/ggerganov/ggml/tree/master/examples/dolly-v2) 
-- [GPT2](https://github.com/ggerganov/ggml/tree/master/examples/gpt-2)
-- [GPT J](https://github.com/ggerganov/ggml/tree/master/examples/gpt-j)
-- [GPT NEO X](https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox)
-- [MPT](https://github.com/ggerganov/ggml/tree/master/examples/mpt)
-- [Replit](https://github.com/ggerganov/ggml/tree/master/examples/replit)
-- [StarCoder](https://github.com/ggerganov/ggml/tree/master/examples/starcoder)
+-  [TinyLLaMA Series - 1,2,3🦙](https://huggingface.co/TinyLlama)
+-  [GPT-2](https://huggingface.co/gpt2)
+-  [Tiny Mistral Series](https://huggingface.co/Locutusque/TinyMistral-248M)
+-  [Tiny StarCoder Py](https://huggingface.co/bigcode/tiny_starcoder_py)
+-  [Qwen Models](https://huggingface.co/Qwen)
+-  [TinySolar](https://huggingface.co/upstage/TinySolar-248m-4k-code-instruct)
+-  [Pythia](https://github.com/EleutherAI/pythia)
+-  [Mamba](https://huggingface.co/state-spaces/mamba-130m-hf)
+and much more✨ 
+
+## Features
+
+- Run inference directly on browser (even on smartphones)
+- Developed in pure JavaScript
+- Web Worker to perform background tasks (model downloading/inference)
+- Model Caching support
+- Pre-built [packages](https://github.com/rahuldshetty/llm.js/releases) to directly plug-and-play into your web apps.
 
-*New models/formats coming soon*⏰
diff --git a/docs/TODO.md b/docs/TODO.md
@@ -3,6 +3,10 @@
 
 ## Bugs
 
-- Caching Bug in browser (after publishing): `SecurityError: The operation is insecure.`
-
-## Enhancements/Ideas
+## Enhancements/Ideas
+- WebGPU Support for running GGUF models
+- Add latest config options (sampling params, json/cfg grammar, lora) for GGUF model runner
+- Improve Code structure, wrapper class, etc
+- Move to TypeScript?
+- ONNX model support (transformer.js does it already better, not sure of the effort)
+- 
diff --git a/docs/_coverpage.md b/docs/_coverpage.md
@@ -1,8 +1,8 @@
 <!-- _coverpage.md -->
 
-<!-- <img src="_media/logo.png" width="400"> 1.0.0 -->
+<img src="_media/logo.jpg" > 2.0.0
 
-# LLM.js <small>1.0.2</small>
+# LLM.js <small>2.0.0</small>
 
 > Run Large-Language Models (LLMs) 🚀 directly in your browser!
 

diff --git a/docs/_media/logo.jpg b/docs/_media/logo.jpg
diff --git a/docs/api_guide.md b/docs/api_guide.md
@@ -15,18 +15,17 @@ Model Initializer called during LLM app creation.
 
 Parameter               | Description | Example
 ---                     |           ---               | ---
-type                    | Type of Model. <br> Values:<br>- LLAMA<br>- LLAMA2<br>- DOLLY_V2<br>- GPT_2<br>- GPT_J<br>- GPT_NEO_X<br>- MPT<br>- REPLIT<br>- STARCODER  | 'STARCODER'
-url                     | Model URL | [./starcoder.bin](https://huggingface.co/rahuldshetty/ggml.js/resolve/main/starcoder.bin)
+type                    | Type of Model. <br> Values:<br>- GGUF_CPU<br>  | 'GGUF_CPU'
+url                     | Model URL | [./tiny_starcoder_py.Q8_0.gguf](https://huggingface.co/RichardErkhov/bigcode_-_tiny_starcoder_py-gguf/resolve/main/tiny_starcoder_py.Q8_0.gguf)
 init_callback           | Callback method to run after model initialization. | `() => { console.log('model loaded') }`
 write_result_callback   | Callback method to print model result. | `(text) => { console.log('model result:' + test) }`
 on_complete_callback    | Callback method to run after model run. | `() => { console.log('model execution completed') }`
-tokenizer_url    | Tokenizer URL (Support only for LLaMa2.c model) | [./tokenizer.url](https://huggingface.co/rahuldshetty/ggml.js/resolve/main/llama2/tokenizer.bin) (Default: null)
 
 Usage:
 ```js
 const app = new LLM(
-    'STARCODER',
-    'https://huggingface.co/rahuldshetty/ggml.js/resolve/main/starcoder.bin',
+    'GGUF_CPU',
+    'https://huggingface.co/RichardErkhov/bigcode_-_tiny_starcoder_py-gguf/resolve/main/tiny_starcoder_py.Q8_0.gguf',
     ()=>{},
     (text)=>{console.log(text)},
     ()=>{}
@@ -35,10 +34,10 @@ const app = new LLM(
 
 ### load_worker (Method)
 
-Download and load model binary into WebAssembly's VM File System ⏬📂.
+Download and load model binary into WebAssembly's File System ⏬📂.
 
-- Doesn't take in any parameters.
-- This method should be called before the run method. 🔄
+- Models are cached in the browser window.
+- After the model initialization *init_callback* is called.
 
 Usage:
 ```js
@@ -48,11 +47,11 @@ app.load_worker();
 
 ### run (Method)
 
-Call this method to run model inference and generate text 📝.
+Call this method to run your prompts and generate response 📝.
 
 - This method takes an Object Parameter as Input ⚙️.
-- Model output can be captured by the write_result_callback constructor method. 
-
+- Model output can be captured by the *write_result_callback* method.
+- Once inference is completed, then the *on_complete_callback* is called. 
 
 Parameter                | Description | Example
 ---                      |           ---               | ---
@@ -61,7 +60,7 @@ max_token_len (number)   | Maximum length of tokens to output.  |  (Default: 50)
 top_k (number)           | No. of tokens to consider for model sampling.  | (Default: 40)
 top_p (number)           | Cumulative probability limits for the samples tokens to consider.  | (Default: 0.9)
 temp (number)            | Parameter to control distribution of model sampling. | (Default: 1.0)
-context_size (number)   **(ONLY FOR TYPE: LLAMA)**         | Set total *context_size* for the model. | (Default: 512)
+context_size (number)      | Set total *context_size* for the model. | (Default: 512)
 
 
 Usage:

diff --git a/docs/examples.md b/docs/examples.md
@@ -1,11 +1,8 @@
 # Examples
 
-Get started with some example [projects](https://rahuldshetty.github.io/ggml.js-examples/).
+Get started with some example [projects](https://rahuldshetty.github.io/llm.js-examples/).
 
 | Name              | Description                      | Source code                   |
 |-------------------|----------------------------------|-------------------------------|
-| [Playground](https://rahuldshetty.github.io/ggml.js-examples/playground.html) | Playground Example  | [link](https://github.com/rahuldshetty/ggml.js-examples/tree/master/playground.html) |
-| [Quick Start](https://rahuldshetty.github.io/ggml.js-examples/quick-start/index.html) | Basic demo example  | [link](https://github.com/rahuldshetty/ggml.js-examples/tree/master/quick-start) |
-| [Tiny Starcoder](https://rahuldshetty.github.io/ggml.js-examples/starcoder.html) | Demo on [tiny starcoder](https://huggingface.co/bigcode/tiny_starcoder_py)  | [link](https://github.com/rahuldshetty/ggml.js-examples/blob/master/starcoder.html) |
-| [GPT 2 Roleplay](https://rahuldshetty.github.io/ggml.js-examples/gpt2_roleplay.html) | Demo on [GPT2-RPGPT-8.48M](https://huggingface.co/xzuyn/GPT2-RPGPT-8.48M)  | [link](https://github.com/rahuldshetty/ggml.js-examples/blob/master/gpt2_roleplay.html) |
-| [LLaMa2 TinyStories](https://rahuldshetty.github.io/ggml.js-examples/llama2_tinystories.html) | Demo on [LLaMa2.C](https://github.com/karpathy/llama2.c)  | [link](https://github.com/rahuldshetty/ggml.js-examples/blob/master/llama2_tinystories.html) |
+| [Playground](https://rahuldshetty.github.io/llm.js-examples/playground.html) | Playground Example  | [link](https://github.com/rahuldshetty/llm.js-examples/tree/master/playground.html) |
+| [Quick Start](https://rahuldshetty.github.io/llm.js-examples/quick-start/index.html) | Basic demo example  | [link](https://github.com/rahuldshetty/llm.js-examples/tree/master/quick-start) |
diff --git a/docs/quick_start.md b/docs/quick_start.md
@@ -8,11 +8,8 @@ After extracting the zip file, you will find the package structure like this:
 .
 └── llm.js/
     ├── wasm/
-    │   ├── dollyv2.js
-    │   ├── gpt2.js
-    │   └── ...
-    ├── ggml.js
-    ├── 19.js
+    │   └── llamacpp-cpu.js
+    ├── llm.js
     └── ...
 ```
 
@@ -58,10 +55,10 @@ const run_complete = () => {}
 // Configure LLM app
 const app = new LLM(
      // Type of Model
-    'STARCODER',    
+    'GGUF_CPU',
 
     // Model URL
-    'https://huggingface.co/rahuldshetty/ggml.js/resolve/main/starcoder.bin', 
+    'https://huggingface.co/RichardErkhov/bigcode_-_tiny_starcoder_py-gguf/resolve/main/tiny_starcoder_py.Q8_0.gguf',
 
     // Model Load callback function
     on_loaded,