Skip to content

Commit

Permalink
Merge pull request #12 from rahuldshetty/gguf_rework
Browse files Browse the repository at this point in the history
Initial Release of 2.0.0 version
  • Loading branch information
rahuldshetty authored Jun 26, 2024
2 parents 3d06d77 + 69a8d8d commit 1f179ac
Show file tree
Hide file tree
Showing 33 changed files with 978 additions and 2,118 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,5 @@ zig-cache/
node_modules/
dist/
releases/
examples/
examples/
package-lock.json
12 changes: 9 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,17 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.0.3] (alpha)
## [2.0.0]

### Added
### Changes

- Introducing new model type: GGUF_CPU to support running gguf compiled models on CPU with LLM.js
- Bumped up Llama.cpp worker to build commit [dd047b4](https://github.com/ggerganov/llama.cpp/tree/dd047b476c8b904e0c25e5dbc5bee6ffde2f6e17)
- Model Caching: Persisting model files in WebWorker's cache to avoid re-downloading model on every load. Thanks to [PR](https://github.com/rahuldshetty/llm.js/pull/3) from [@felladrin](https://github.com/felladrin).

### Removes

- Model Caching: Persisting model files in WebWorker's cache to avoid re-downloading model on every load. Thanks to [PR](https://github.com/rahuldshetty/llm.js/pull/3) from [@felladrin](https://github.com/felladrin).
- Deprecating model types: LLAMA2, DOLLY_V2, GPT_2, GPT_J, GPT_NEO_X, MPT, REPLIT, STARCODER

## [1.0.2]

Expand Down
36 changes: 21 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<div align="center" style="display:flex; align-items:center;justify-content: center;background:#e1e1e1;color:#0f0f0f;padding:50px;">
<img alt="llm.js logo" src="https://raw.githubusercontent.com/rahuldshetty/llm.js/master/docs/_media/logo.png" width="350">
<img alt="llm.js logo" src="https://raw.githubusercontent.com/rahuldshetty/llm.js/master/docs/_media/logo.jpg">
</div>

<p align="center">
Expand All @@ -16,22 +16,28 @@
<img alt="Sample" src="https://raw.githubusercontent.com/rahuldshetty/llm.js/master/docs/_media/demo.gif">
</p>

Example projects🌐✨: [Live Demo](https://rahuldshetty.github.io/ggml.js-examples/)
Example projects🌐✨: [Live Demo](https://rahuldshetty.github.io/llm.js-examples/)

Learn More: [Documentation](https://rahuldshetty.github.io/llm.js/)

Models Supported:
- [llama-cpp (GGUF/GGML)](https://github.com/ggerganov/llama.cpp)
- [LLaMa 2](https://github.com/karpathy/llama2.c)
- [Dolly v2](https://github.com/ggerganov/ggml/tree/master/examples/dolly-v2)
- [GPT2](https://github.com/ggerganov/ggml/tree/master/examples/gpt-2)
- [GPT J](https://github.com/ggerganov/ggml/tree/master/examples/gpt-j)
- [GPT NEO X](https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox)
- [MPT](https://github.com/ggerganov/ggml/tree/master/examples/mpt)
- [Replit](https://github.com/ggerganov/ggml/tree/master/examples/replit)
- [StarCoder](https://github.com/ggerganov/ggml/tree/master/examples/starcoder)

*New models/formats coming soon*
- [TinyLLaMA Series - 1,2,3🦙](https://huggingface.co/TinyLlama)
- [GPT-2](https://huggingface.co/gpt2)
- [Tiny Mistral Series](https://huggingface.co/Locutusque/TinyMistral-248M)
- [Tiny StarCoder Py](https://huggingface.co/bigcode/tiny_starcoder_py)
- [Qwen Models](https://huggingface.co/Qwen)
- [TinySolar](https://huggingface.co/upstage/TinySolar-248m-4k-code-instruct)
- [Pythia](https://github.com/EleutherAI/pythia)
- [Mamba](https://huggingface.co/state-spaces/mamba-130m-hf)
and much more✨

## Features

- Run inference directly on browser (even on smartphones)
- Developed in pure JavaScript
- Web Worker to perform background tasks (model downloading/inference)
- Model Caching support
- Pre-built [packages](https://github.com/rahuldshetty/llm.js/releases) to directly plug-and-play into your web apps.

## Installation

Expand Down Expand Up @@ -59,10 +65,10 @@ const run_complete = () => {}
// Configure LLM app
const app = new LLM(
// Type of Model
'STARCODER',
'GGUF_CPU',

// Model URL
'https://huggingface.co/rahuldshetty/ggml.js/resolve/main/starcoder.bin',
'https://huggingface.co/RichardErkhov/bigcode_-_tiny_starcoder_py-gguf/resolve/main/tiny_starcoder_py.Q8_0.gguf',

// Model Load callback function
on_loaded,
Expand Down
41 changes: 0 additions & 41 deletions build-ggml.sh

This file was deleted.

37 changes: 0 additions & 37 deletions build-llama2-c.sh

This file was deleted.

1 change: 0 additions & 1 deletion debug.log

This file was deleted.

20 changes: 4 additions & 16 deletions docs/BUILD.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,25 +14,13 @@ In order to build llm.js, you need to have the following pre-requisites 🛠️
cd llm.js
```
2) Run the ggml build script to generate WASM files:
2) Run the llama build script to generate WASM files:
```
./build-ggml.sh
sh scripts/build-llama-cpp-wasm.sh
```
This script will download the ggml repository and apply llm.js patches to generate WebAssembly bundles for GGML models.
You can find the WebAssembly JS files in `build/bin/bin/` location.
## Build WebAssembly bundles for Tiny-LLAMA2 models
1) Run the llama2 build script to generate WASM files:
```
./build-llama2-c.sh
## Build WebAssembly bundles for LLAMA models
1) Run the llama build script to generate WASM files:
```
./build-llama-cpp.sh
This script will download the llama.cpp repository and apply llm.js patches to generate WebAssembly bundles.
You can find the WebAssembly JS files in `build/llama-bin/bin/` location.
## Package llm.js
Expand Down
31 changes: 19 additions & 12 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,26 @@

Run Large-Language Models (LLMs) 🚀 directly in your browser!

LLM.js provides JavaScript bindings for interacting with quantized large language models (GGUF/GGML/tiny-llama2).
LLM.js provides JavaScript bindings for interacting with quantized large language models (GGUF).

Example projects🌐✨: [Live Demo](https://rahuldshetty.github.io/ggml.js-examples/)
Example projects🌐✨: [Live Demo](https://rahuldshetty.github.io/llm.js-examples/)

Models Supported:
- [llama-cpp (GGUF/GGML)](https://github.com/ggerganov/llama.cpp)
- [LLaMa 2](https://github.com/karpathy/llama2.c)
- [Dolly v2](https://github.com/ggerganov/ggml/tree/master/examples/dolly-v2)
- [GPT2](https://github.com/ggerganov/ggml/tree/master/examples/gpt-2)
- [GPT J](https://github.com/ggerganov/ggml/tree/master/examples/gpt-j)
- [GPT NEO X](https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox)
- [MPT](https://github.com/ggerganov/ggml/tree/master/examples/mpt)
- [Replit](https://github.com/ggerganov/ggml/tree/master/examples/replit)
- [StarCoder](https://github.com/ggerganov/ggml/tree/master/examples/starcoder)
- [TinyLLaMA Series - 1,2,3🦙](https://huggingface.co/TinyLlama)
- [GPT-2](https://huggingface.co/gpt2)
- [Tiny Mistral Series](https://huggingface.co/Locutusque/TinyMistral-248M)
- [Tiny StarCoder Py](https://huggingface.co/bigcode/tiny_starcoder_py)
- [Qwen Models](https://huggingface.co/Qwen)
- [TinySolar](https://huggingface.co/upstage/TinySolar-248m-4k-code-instruct)
- [Pythia](https://github.com/EleutherAI/pythia)
- [Mamba](https://huggingface.co/state-spaces/mamba-130m-hf)
and much more✨

## Features

- Run inference directly on browser (even on smartphones)
- Developed in pure JavaScript
- Web Worker to perform background tasks (model downloading/inference)
- Model Caching support
- Pre-built [packages](https://github.com/rahuldshetty/llm.js/releases) to directly plug-and-play into your web apps.

*New models/formats coming soon*
10 changes: 7 additions & 3 deletions docs/TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@

## Bugs

- Caching Bug in browser (after publishing): `SecurityError: The operation is insecure.`

## Enhancements/Ideas
## Enhancements/Ideas
- WebGPU Support for running GGUF models
- Add latest config options (sampling params, json/cfg grammar, lora) for GGUF model runner
- Improve Code structure, wrapper class, etc
- Move to TypeScript?
- ONNX model support (transformer.js does it already better, not sure of the effort)
-
4 changes: 2 additions & 2 deletions docs/_coverpage.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<!-- _coverpage.md -->

<!-- <img src="_media/logo.png" width="400"> 1.0.0 -->
<img src="_media/logo.jpg" > 2.0.0

# LLM.js <small>1.0.2</small>
# LLM.js <small>2.0.0</small>

> Run Large-Language Models (LLMs) 🚀 directly in your browser!
Expand Down
Binary file added docs/_media/logo.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 11 additions & 12 deletions docs/api_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,17 @@ Model Initializer called during LLM app creation.

Parameter | Description | Example
--- | --- | ---
type | Type of Model. <br> Values:<br>- LLAMA<br>- LLAMA2<br>- DOLLY_V2<br>- GPT_2<br>- GPT_J<br>- GPT_NEO_X<br>- MPT<br>- REPLIT<br>- STARCODER | 'STARCODER'
url | Model URL | [./starcoder.bin](https://huggingface.co/rahuldshetty/ggml.js/resolve/main/starcoder.bin)
type | Type of Model. <br> Values:<br>- GGUF_CPU<br> | 'GGUF_CPU'
url | Model URL | [./tiny_starcoder_py.Q8_0.gguf](https://huggingface.co/RichardErkhov/bigcode_-_tiny_starcoder_py-gguf/resolve/main/tiny_starcoder_py.Q8_0.gguf)
init_callback | Callback method to run after model initialization. | `() => { console.log('model loaded') }`
write_result_callback | Callback method to print model result. | `(text) => { console.log('model result:' + test) }`
on_complete_callback | Callback method to run after model run. | `() => { console.log('model execution completed') }`
tokenizer_url | Tokenizer URL (Support only for LLaMa2.c model) | [./tokenizer.url](https://huggingface.co/rahuldshetty/ggml.js/resolve/main/llama2/tokenizer.bin) (Default: null)

Usage:
```js
const app = new LLM(
'STARCODER',
'https://huggingface.co/rahuldshetty/ggml.js/resolve/main/starcoder.bin',
'GGUF_CPU',
'https://huggingface.co/RichardErkhov/bigcode_-_tiny_starcoder_py-gguf/resolve/main/tiny_starcoder_py.Q8_0.gguf',
()=>{},
(text)=>{console.log(text)},
()=>{}
Expand All @@ -35,10 +34,10 @@ const app = new LLM(

### load_worker (Method)

Download and load model binary into WebAssembly's VM File System ⏬📂.
Download and load model binary into WebAssembly's File System ⏬📂.

- Doesn't take in any parameters.
- This method should be called before the run method. 🔄
- Models are cached in the browser window.
- After the model initialization *init_callback* is called.

Usage:
```js
Expand All @@ -48,11 +47,11 @@ app.load_worker();

### run (Method)

Call this method to run model inference and generate text 📝.
Call this method to run your prompts and generate response 📝.

- This method takes an Object Parameter as Input ⚙️.
- Model output can be captured by the write_result_callback constructor method.

- Model output can be captured by the *write_result_callback* method.
- Once inference is completed, then the *on_complete_callback* is called.

Parameter | Description | Example
--- | --- | ---
Expand All @@ -61,7 +60,7 @@ max_token_len (number) | Maximum length of tokens to output. | (Default: 50)
top_k (number) | No. of tokens to consider for model sampling. | (Default: 40)
top_p (number) | Cumulative probability limits for the samples tokens to consider. | (Default: 0.9)
temp (number) | Parameter to control distribution of model sampling. | (Default: 1.0)
context_size (number) **(ONLY FOR TYPE: LLAMA)** | Set total *context_size* for the model. | (Default: 512)
context_size (number) | Set total *context_size* for the model. | (Default: 512)


Usage:
Expand Down
9 changes: 3 additions & 6 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
# Examples

Get started with some example [projects](https://rahuldshetty.github.io/ggml.js-examples/).
Get started with some example [projects](https://rahuldshetty.github.io/llm.js-examples/).

| Name | Description | Source code |
|-------------------|----------------------------------|-------------------------------|
| [Playground](https://rahuldshetty.github.io/ggml.js-examples/playground.html) | Playground Example | [link](https://github.com/rahuldshetty/ggml.js-examples/tree/master/playground.html) |
| [Quick Start](https://rahuldshetty.github.io/ggml.js-examples/quick-start/index.html) | Basic demo example | [link](https://github.com/rahuldshetty/ggml.js-examples/tree/master/quick-start) |
| [Tiny Starcoder](https://rahuldshetty.github.io/ggml.js-examples/starcoder.html) | Demo on [tiny starcoder](https://huggingface.co/bigcode/tiny_starcoder_py) | [link](https://github.com/rahuldshetty/ggml.js-examples/blob/master/starcoder.html) |
| [GPT 2 Roleplay](https://rahuldshetty.github.io/ggml.js-examples/gpt2_roleplay.html) | Demo on [GPT2-RPGPT-8.48M](https://huggingface.co/xzuyn/GPT2-RPGPT-8.48M) | [link](https://github.com/rahuldshetty/ggml.js-examples/blob/master/gpt2_roleplay.html) |
| [LLaMa2 TinyStories](https://rahuldshetty.github.io/ggml.js-examples/llama2_tinystories.html) | Demo on [LLaMa2.C](https://github.com/karpathy/llama2.c) | [link](https://github.com/rahuldshetty/ggml.js-examples/blob/master/llama2_tinystories.html) |
| [Playground](https://rahuldshetty.github.io/llm.js-examples/playground.html) | Playground Example | [link](https://github.com/rahuldshetty/llm.js-examples/tree/master/playground.html) |
| [Quick Start](https://rahuldshetty.github.io/llm.js-examples/quick-start/index.html) | Basic demo example | [link](https://github.com/rahuldshetty/llm.js-examples/tree/master/quick-start) |
11 changes: 4 additions & 7 deletions docs/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,8 @@ After extracting the zip file, you will find the package structure like this:
.
└── llm.js/
├── wasm/
│ ├── dollyv2.js
│ ├── gpt2.js
│ └── ...
├── ggml.js
├── 19.js
│ └── llamacpp-cpu.js
├── llm.js
└── ...
```

Expand Down Expand Up @@ -58,10 +55,10 @@ const run_complete = () => {}
// Configure LLM app
const app = new LLM(
// Type of Model
'STARCODER',
'GGUF_CPU',

// Model URL
'https://huggingface.co/rahuldshetty/ggml.js/resolve/main/starcoder.bin',
'https://huggingface.co/RichardErkhov/bigcode_-_tiny_starcoder_py-gguf/resolve/main/tiny_starcoder_py.Q8_0.gguf',

// Model Load callback function
on_loaded,
Expand Down
Loading

0 comments on commit 1f179ac

Please sign in to comment.