Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Profiling] Model and device specific folders for profiled data, documentation for profiling #7

Merged
merged 26 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
FILL IN THE PR DESCRIPTION HERE

FIX #xxxx (*link existing issues this PR will resolve*)

**BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE**

---

<details>
<!-- inside this <details> section, markdown rendering does not work, so we use raw html here. -->
<summary><b> PR Checklist (Click to Expand) </b></summary>

<p>Thank you for your contribution to Vidur! Before submitting the pull request, please ensure the PR meets the following criteria. This helps Vidur maintain the code quality and improve the efficiency of the review process.</p>

<h3>PR Title and Classification</h3>
<p>Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:</p>
<ul>
<li><code>[Bugfix]</code> for bug fixes.</li>
<li><code>[CI/Build]</code> for build or continuous integration improvements.</li>
<li><code>[Doc]</code> for documentation fixes and improvements.</li>
<li><code>[Model]</code> for adding a new model or improving an existing model. Model name should appear in the title.</li>
<li><code>[Profiling]</code> For changes on the profiling module. </li>
<li><code>[Core]</code> for changes in the core simulator logic </li>
<li><code>[Misc]</code> for PRs that do not fit the above categories. Please use this sparingly.</li>
</ul>
<p><strong>Note:</strong> If the PR spans more than one category, please include all relevant prefixes.</p>

<h3>Code Quality</h3>

<p>The PR need to meet the following code quality standards:</p>

<ul>
<li>Pass all linter checks. Please use <code>make format</code></a> to format your code.</li>
<li>The code need to be well-documented to ensure future contributors can easily understand the code.</li>
<li>Please add documentation to <code>docs/source/</code> if the PR modifies the user-facing behaviors of Vidur. It helps user understand and utilize the new features or changes.</li>
</ul>

<h3>Notes for Large Changes</h3>
<p>Please keep the changes as concise as possible. For major architectural changes (>500 LOC), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with <code>rfc-required</code> and might not go through the PR.</p>

<h3>Thank You</h3>

<p> Finally, thank you for taking the time to read these guidelines and for your interest in contributing to Vidur. Your contributions make Vidur a great tool for everyone! </p>


</details>
22 changes: 16 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# Vidur: LLM Inference Simulator

We highly recommend reading the [Vidur: A Large-Scale Simulation Framework for LLM Inference](https://www.microsoft.com/en-us/research/publication/vidur-a-large-scale-simulation-framework-for-llm-inference/) paper first before going into the code.
Vidur is a high-fidelity LLM inference simulator, designed to aid capacity planning and deployment configuration optimization. Please refer to our [MLSys'24 paper](https://arxiv.org/abs/2405.05465) for more details.<br>
We have a [live demo](https://vidur.westus2.cloudapp.azure.com/) that captures the capabilities of the system.

![Simulator Fidelity](./assets/dynamic_fidelity_v8_request_e2e_time_normalized_85_p95.jpeg)
*Difference in 95th percentile Request E2E Normalized time showing fidelity of Vidur's execution time predictions across four models and three dynamic workload traces, using request load at 85% of the maximum serving capacity for each scenario.*
![Config Search](./assets/llama70b_Chat1M_ttft_tbt_90_99_2.0_0.2.jpeg)
*Capacity per dollar for different deployment configurations vs TTFT-P90 (left) and TBT-P99 (middle) for LLaMA2-70B.*

## Setup

Expand Down Expand Up @@ -41,10 +47,10 @@ wandb login --host https://<your-org>.wandb.io

To opt out of wandb, pick any one of the following methods:

1. `export WANDB_MODE=disabled` in your shell or add this in `~/.zshrc` or `~/.bashrc`. Remeber to reload using `source ~/.zshrc`.
2. Set `wandb_project` and `wandb_group` as `""` in `simulator/config/default.yml`. Also remove these CLI params from the shell command with which the simulator is invoked.
1. `export WANDB_MODE=disabled` in your shell or add this in `~/.zshrc` or `~/.bashrc`. Remember to reload using `source ~/.zshrc`.
2. Set `wandb_project` and `wandb_group` as `""` in `simulator/config/default.yml`. Also, remove these CLI params from the shell command with which the simulator is invoked.

## Running simulator
## Running the simulator

To run the simulator, execute the following command from the repository root,

Expand Down Expand Up @@ -78,9 +84,13 @@ python -m simulator.main \
--vllm_scheduler_max_tokens_in_batch 4096
```

The simulator supports a plethora of parameters for the simulation description of which can be found [here](docs/simulator_params.md).
The simulator supports a plethora of parameters for the simulation description which can be found [here](simulator/config/README.md).

The metrics will be logged to wandb directly and a copy will be stored in the `simulator_output` directory along with the chrome trace. A description of all the logged metrics can be found [here](simulator/metrics/README.md).

## Adding a new model

The metrics will be logged to wandb directly and a copy will be stored in the `simulator_output` directory along with the chrome trace. A description of all the logged metrics can be found [here](docs/simulator_metrics.md).
Instructions on adding a new model can be found [here](simulator/profiling/README.md).

## Formatting Code

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading