Skip to content

Commit

Permalink
Update stories cmd to use kv cache (#5460)
Browse files Browse the repository at this point in the history
Summary: Pull Request resolved: #5460

Reviewed By: dvorjackz

Differential Revision: D62925331

Pulled By: lucylq

fbshipit-source-id: a5c977055fe208cd8f1db20f147247a5a0f6fdbf
  • Loading branch information
lucylq authored and facebook-github-bot committed Sep 18, 2024
1 parent 0648a8a commit d2a38cc
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions examples/models/llama2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Note that since Llama3's vocabulary size is 4x that of Llama2, we had to quantiz
|OnePlus 12 | 10.85 tokens/second | 11.02 tokens/second |

### Llama3.1
> :warning: **use the main branch**: Llama3.1 is supported on the ExecuTorch main branch (not release 0.3).
Llama3.1 is supported on the ExecuTorch main branch and release/0.4

# Instructions

Expand Down Expand Up @@ -117,7 +117,7 @@ If you want to deploy and run a smaller model for educational purposes. From `ex
```
3. Export model and generate `.pte` file.
```
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
```
4. Create tokenizer.bin.
Expand Down

0 comments on commit d2a38cc

Please sign in to comment.