Skip to content

Commit

Permalink
Move documentation around
Browse files Browse the repository at this point in the history
  • Loading branch information
pierre.delaunay committed Jan 13, 2025
1 parent 9ee81d7 commit 8407dca
Show file tree
Hide file tree
Showing 16 changed files with 150 additions and 8 deletions.
File renamed without changes.
49 changes: 49 additions & 0 deletions docs/Contributing/design.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Design
======

Milabench aims to simulate research workloads for benchmarking purposes.

* Performance is measured as throughput (samples / secs).
For example, for a model like resnet the throughput would be image per seconds.

* Single GPU workloads are spawned per GPU to ensure the entire machine is used.
Simulating something similar to a hyper parameter search.
The performance of the benchmark is the sum of throughput of each processes.

* Multi GPU workloads

* Multi Nodes


Run
===

* Milabench Manager Process
* Handles messages from benchmark processes
* Saves messages into a file for future analysis

* Benchmark processes
* run using ``voir``
* voir is configured to intercept and send events during the training process
* This allow us to add models from git repositories without modification
* voir sends data through a file descriptor that was created by milabench main process


What milabench is
=================

* Training focused
* milabench show candid performance numbers
* No optimization beyond batch size scaling is performed
* we want to measure the performance our researcher will see
not the performance they could get.
* pytorch centric
* Pytorch has become the defacto library for research
* We are looking for accelerator with good maturity that can support
this framework with limited code change.


What milabench is not
=====================

* milabench goal is not a performance show case of an accelerator.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
1 change: 1 addition & 0 deletions docs/process.rst → docs/Contributing/process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Preparing

* NVIDIA
* AMD
* Intel

2. Create a milabench configuration for your RFP
Milabench comes with a wide variety of benchmarks.
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 4 additions & 0 deletions docs/Welcome/Changelog.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Changelog
=========

TBD
54 changes: 54 additions & 0 deletions docs/Welcome/Features.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
Features
========

* non intruisive Instrumentation
* Validation Layers
* Automatic batch resizing
* Docker
* Hardware
* ROCm 5.7
* NVIDIA
* Metrics gathering
* Performance throughput
* GPU util
* CPU util
* IO util


Benchmarks
----------

.. code-block:: text
+--------------------------+-----------+-----------+-------------+-----------+-------------------+
| Benchmark | Unit | Domain | Network | Focus | Task |
+==========================+===========+===========+=============+===========+===================+
| bf16 | TFlops | Synthetic | | Training | |
| fp16 | TFlops | Synthetic | | Training | |
| tf32 | TFlops | Synthetic | | Training | |
| fp32 | TFlops | Synthetic | | Training | |
| bert-fp16 | | NLP | Transformer | Training | Language Modeling |
| bert-fp32 | | NLP | Transformer | Training | Language Modeling |
| bert-tf32 | | NLP | Transformer | Training | Language Modeling |
| bert-tf32-fp16 | | NLP | Transformer | Training | Language Modeling |
| opt-1_3b | | NLP | Transformer | Training | Language Modeling |
| opt-6_7b | | NLP | Transformer | Training | Language Modeling |
| reformer | | NLP | Transformer | Training | Language Modeling |
| rwkv | | NLP | RNN | Training | Language Modeling |
| llama | Token/sec | NLP | Transformer | Inference | Generation |
| dlrm | | NLP | | Training | Recommendation |
| convnext_large-fp16 | img/sec | Vision | Convolution | Training | Classification |
| convnext_large-fp32 | img/sec | Vision | Convolution | Training | Classification |
| convnext_large-tf32 | img/sec | Vision | Convolution | Training | Classification |
| convnext_large-tf32-fp16 | img/sec | Vision | Convolution | Training | Classification |
| davit_large | img/sec | Vision | Transformer | Training | Classification |
| focalnet | | Vision | Convolution | Training | Classification |
| davit_large-multi | img/sec | Vision | Transformer | Training | Classification |
| regnet_y_128gf | img/sec | Vision | Convolution | Training | Classification |
| resnet152 | img/sec | Vision | Convolution | Training | Classification |
| resnet152-multi | img/sec | Vision | Convolution | Training | Classification |
| resnet50 | img/sec | Vision | Convolution | Training | Classification |
| stargan | img/sec | Vision | Convolution | Training | GAN |
| super-slomo | img/sec | Vision | Convolution | Training | |
| t5 | | NLP | Transformer | Training | |
| whisper | | Audio | | Training | |
+--------------------------+-----------+-----------+-------------+-----------+-------------------+
10 changes: 10 additions & 0 deletions docs/Welcome/Roadmap.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Roadmap
=======

* Cloud CI
* ROCm 6.0 - MI300 support
* GPU Max Series - 1550 support
* Evaluate suitability
* Tenstorrent
* Graphcore
* Cerebras
40 changes: 32 additions & 8 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,42 @@
Welcome to milabench's documentation!
=====================================


.. toctree::
:caption: News
:maxdepth: 1

Welcome/Features
Welcome/Roadmap
Welcome/Changelog


.. toctree::
:maxdepth: 2
:caption: Contents:
:caption: Getting Started

GettingStarted/usage.rst
GettingStarted/docker.rst


usage.rst
recipes.rst
new_benchmarks.rst
.. toctree::
:caption: Contributing
:maxdepth: 1

Contributing/new_benchmarks
Contributing/sizer
Contributing/dev-usage
Contributing/design
Contributing/flow
Contributing/execution_modes
Contributing/recipes


.. toctree::
:caption: API
:maxdepth: 1

docker.rst
dev-usage.rst
reference.rst
sizer.rst
ref-pack.rst


Indices and tables
Expand Down

0 comments on commit 8407dca

Please sign in to comment.