-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
pierre.delaunay
committed
Jan 13, 2025
1 parent
9ee81d7
commit f09f396
Showing
16 changed files
with
257 additions
and
121 deletions.
There are no files selected for viewing
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
Design | ||
====== | ||
|
||
Milabench aims to simulate research workloads for benchmarking purposes. | ||
|
||
* Performance is measured as throughput (samples / secs). | ||
For example, for a model like resnet the throughput would be image per seconds. | ||
|
||
* Single GPU workloads are spawned per GPU to ensure the entire machine is used. | ||
Simulating something similar to a hyper parameter search. | ||
The performance of the benchmark is the sum of throughput of each processes. | ||
|
||
* Multi GPU workloads | ||
|
||
* Multi Nodes | ||
|
||
|
||
Run | ||
--- | ||
|
||
* Milabench Manager Process | ||
* Handles messages from benchmark processes | ||
* Saves messages into a file for future analysis | ||
|
||
* Benchmark processes | ||
* run using ``voir`` | ||
* voir is configured to intercept and send events during the training process | ||
* This allow us to add models from git repositories without modification | ||
* voir sends data through a file descriptor that was created by milabench main process | ||
|
||
|
||
What milabench is | ||
----------------- | ||
|
||
* Training focused | ||
* milabench show candid performance numbers | ||
* No optimization beyond batch size scaling is performed | ||
* we want to measure the performance our researcher will see | ||
not the performance they could get. | ||
* pytorch centric | ||
* Pytorch has become the defacto library for research | ||
* We are looking for accelerator with good maturity that can support | ||
this framework with limited code change. | ||
|
||
|
||
What milabench is not | ||
--------------------- | ||
|
||
* milabench goal is not a performance show case of an accelerator. |
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions
4
docs/new_benchmarks.rst → docs/Contributing/new_benchmarks.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Changelog | ||
========= | ||
|
||
TBD |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
Features | ||
======== | ||
|
||
* non intruisive Instrumentation | ||
* Validation Layers | ||
* Automatic batch resizing | ||
* Docker | ||
* Hardware | ||
* ROCm 5.7 | ||
* NVIDIA | ||
* Metrics gathering | ||
* Performance throughput | ||
* GPU util | ||
* CPU util | ||
* IO util | ||
|
||
|
||
Benchmarks | ||
---------- | ||
|
||
.. code-block:: text | ||
+--------------------------+-----------+-----------+-------------+-----------+-------------------+ | ||
| Benchmark | Unit | Domain | Network | Focus | Task | | ||
+==========================+===========+===========+=============+===========+===================+ | ||
| bf16 | TFlops | Synthetic | | Training | | | ||
| fp16 | TFlops | Synthetic | | Training | | | ||
| tf32 | TFlops | Synthetic | | Training | | | ||
| fp32 | TFlops | Synthetic | | Training | | | ||
| bert-fp16 | | NLP | Transformer | Training | Language Modeling | | ||
| bert-fp32 | | NLP | Transformer | Training | Language Modeling | | ||
| bert-tf32 | | NLP | Transformer | Training | Language Modeling | | ||
| bert-tf32-fp16 | | NLP | Transformer | Training | Language Modeling | | ||
| opt-1_3b | | NLP | Transformer | Training | Language Modeling | | ||
| opt-6_7b | | NLP | Transformer | Training | Language Modeling | | ||
| reformer | | NLP | Transformer | Training | Language Modeling | | ||
| rwkv | | NLP | RNN | Training | Language Modeling | | ||
| llama | Token/sec | NLP | Transformer | Inference | Generation | | ||
| dlrm | | NLP | | Training | Recommendation | | ||
| convnext_large-fp16 | img/sec | Vision | Convolution | Training | Classification | | ||
| convnext_large-fp32 | img/sec | Vision | Convolution | Training | Classification | | ||
| convnext_large-tf32 | img/sec | Vision | Convolution | Training | Classification | | ||
| convnext_large-tf32-fp16 | img/sec | Vision | Convolution | Training | Classification | | ||
| davit_large | img/sec | Vision | Transformer | Training | Classification | | ||
| focalnet | | Vision | Convolution | Training | Classification | | ||
| davit_large-multi | img/sec | Vision | Transformer | Training | Classification | | ||
| regnet_y_128gf | img/sec | Vision | Convolution | Training | Classification | | ||
| resnet152 | img/sec | Vision | Convolution | Training | Classification | | ||
| resnet152-multi | img/sec | Vision | Convolution | Training | Classification | | ||
| resnet50 | img/sec | Vision | Convolution | Training | Classification | | ||
| stargan | img/sec | Vision | Convolution | Training | GAN | | ||
| super-slomo | img/sec | Vision | Convolution | Training | | | ||
| t5 | | NLP | Transformer | Training | | | ||
| whisper | | Audio | | Training | | | ||
+--------------------------+-----------+-----------+-------------+-----------+-------------------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Roadmap | ||
======= | ||
|
||
* Cloud CI | ||
* ROCm 6.0 - MI300 support | ||
* GPU Max Series - 1550 support | ||
* Evaluate suitability | ||
* Tenstorrent | ||
* Graphcore | ||
* Cerebras |
Oops, something went wrong.