Skip to content

Commit

Permalink
[Readme] update guidance for llm mode
Browse files Browse the repository at this point in the history
  • Loading branch information
YWHyuk committed Dec 10, 2024
1 parent 2c08ae4 commit 44b83bd
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 2 deletions.
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ $ python3 ./scripts/generate_transformer_onnx.py --model gpt2
$ python3 ./scripts/generate_transformer_onnx.py --model bert
```

## Custom format
ONNXim suppo
------------

## Hardware Configuration
Expand Down Expand Up @@ -129,6 +131,33 @@ $ cd ..
$ ./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --model ./example/models_list.json
```

ONNXim supports custom model formats, with models like Llama and OPT implemented using this feature. Based on this, iteration-level scheduling policy is implemented.

Below is an example of how to execute it (**Note**: You have to add `--language` option):

```
$ ./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list example/language_models.json --mode language
```

`language_models.json` is structured as follows:
```
{
"models": [
{
"name": "opt-125m",
"trace_file": "input.csv",
"scheduler": "simple",
"scheduler_config": {
"max_batch_size": 8
}
}
]
}
```
- name: Specifies the LLM model to be selected.
- trace_file: Sets the request trace file.
- scheduler: Defines the scheduling policy to be used.

------------
## Result

Expand Down
5 changes: 4 additions & 1 deletion models/language_models/opt-125m.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,8 @@
"hidden_size" : 768,
"intermediate_size" : 3072,
"ffn_type" : "default",
"max_seq_length" : 2048
"max_seq_length" : 2048,
"run_single_layer": true,
"tensor_parallel_size" : 1,
"pipeline_parallel_size" : 1
}
5 changes: 4 additions & 1 deletion models/language_models/opt-66b.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,8 @@
"hidden_size" : 9216,
"intermediate_size" : 36864,
"ffn_type" : "default",
"max_seq_length" : 2048
"max_seq_length" : 2048,
"run_single_layer": true,
"tensor_parallel_size" : 1,
"pipeline_parallel_size" : 1
}

0 comments on commit 44b83bd

Please sign in to comment.