Move documentation around

mila-iqia · Jan 13, 2025 · 8407dca · 8407dca
1 parent 9ee81d7
commit 8407dca
Show file tree

Hide file tree

Showing 16 changed files with 150 additions and 8 deletions.
diff --git a/docs/config.rst → docs/Contributing/config.rst b/docs/config.rst → docs/Contributing/config.rst
diff --git a/docs/Contributing/design.rst b/docs/Contributing/design.rst
@@ -0,0 +1,49 @@
+Design
+======
+
+Milabench aims to simulate research workloads for benchmarking purposes.
+
+* Performance is measured as throughput (samples / secs).
+  For example, for a model like resnet the throughput would be image per seconds.
+
+* Single GPU workloads are spawned per GPU to ensure the entire machine is used.
+  Simulating something similar to a hyper parameter search.
+  The performance of the benchmark is the sum of throughput of each processes.
+
+* Multi GPU workloads
+
+* Multi Nodes
+
+
+Run
+===
+
+* Milabench Manager Process
+   * Handles messages from benchmark processes
+   * Saves messages into a file for future analysis
+
+* Benchmark processes
+   * run using ``voir``
+   * voir is configured to intercept and send events during the training process
+   * This allow us to add models from git repositories without modification
+   * voir sends data through a file descriptor that was created by milabench main process
+
+
+What milabench is
+=================
+
+* Training focused
+* milabench show candid performance numbers
+   * No optimization beyond batch size scaling is performed
+   * we want to measure the performance our researcher will see
+     not the performance they could get.
+* pytorch centric
+   * Pytorch has become the defacto library for research
+   * We are looking for accelerator with good maturity that can support
+     this framework with limited code change.
+
+
+What milabench is not
+=====================
+
+* milabench goal is not a performance show case of an accelerator.
diff --git a/docs/dev-usage.rst → docs/Contributing/dev-usage.rst b/docs/dev-usage.rst → docs/Contributing/dev-usage.rst
diff --git a/docs/execution_modes.rst → docs/Contributing/execution_modes.rst b/docs/execution_modes.rst → docs/Contributing/execution_modes.rst
diff --git a/docs/flow.rst → docs/Contributing/flow.rst b/docs/flow.rst → docs/Contributing/flow.rst
diff --git a/docs/instrument.rst → docs/Contributing/instrument.rst b/docs/instrument.rst → docs/Contributing/instrument.rst
diff --git a/docs/new_benchmarks.rst → docs/Contributing/new_benchmarks.rst b/docs/new_benchmarks.rst → docs/Contributing/new_benchmarks.rst
diff --git a/docs/process.rst → docs/Contributing/process.rst b/docs/process.rst → docs/Contributing/process.rst
@@ -8,6 +8,7 @@ Preparing
 
    * NVIDIA
    * AMD
+   * Intel
 
 2. Create a milabench configuration for your RFP
    Milabench comes with a wide variety of benchmarks.

diff --git a/docs/recipes.rst → docs/Contributing/recipes.rst b/docs/recipes.rst → docs/Contributing/recipes.rst
diff --git a/docs/sizer.rst → docs/Contributing/sizer.rst b/docs/sizer.rst → docs/Contributing/sizer.rst
diff --git a/docs/docker.rst → docs/GettingStarted/docker.rst b/docs/docker.rst → docs/GettingStarted/docker.rst
diff --git a/docs/usage.rst → docs/GettingStarted/usage.rst b/docs/usage.rst → docs/GettingStarted/usage.rst
diff --git a/docs/Welcome/Changelog.rst b/docs/Welcome/Changelog.rst
@@ -0,0 +1,4 @@
+Changelog
+=========
+
+TBD
diff --git a/docs/Welcome/Features.rst b/docs/Welcome/Features.rst
@@ -0,0 +1,54 @@
+Features
+========
+
+* non intruisive Instrumentation
+* Validation Layers
+* Automatic batch resizing
+* Docker
+* Hardware
+   * ROCm 5.7
+   * NVIDIA
+* Metrics gathering
+   * Performance throughput
+   * GPU util
+   * CPU util
+   * IO util
+
+
+Benchmarks
+----------
+
+.. code-block:: text
+    +--------------------------+-----------+-----------+-------------+-----------+-------------------+
+    |        Benchmark         |   Unit    |  Domain   |   Network   |   Focus   |       Task        |
+    +==========================+===========+===========+=============+===========+===================+
+    | bf16                     | TFlops    | Synthetic |             | Training  |                   |
+    | fp16                     | TFlops    | Synthetic |             | Training  |                   |
+    | tf32                     | TFlops    | Synthetic |             | Training  |                   |
+    | fp32                     | TFlops    | Synthetic |             | Training  |                   |
+    | bert-fp16                |           | NLP       | Transformer | Training  | Language Modeling |
+    | bert-fp32                |           | NLP       | Transformer | Training  | Language Modeling |
+    | bert-tf32                |           | NLP       | Transformer | Training  | Language Modeling |
+    | bert-tf32-fp16           |           | NLP       | Transformer | Training  | Language Modeling |
+    | opt-1_3b                 |           | NLP       | Transformer | Training  | Language Modeling |
+    | opt-6_7b                 |           | NLP       | Transformer | Training  | Language Modeling |
+    | reformer                 |           | NLP       | Transformer | Training  | Language Modeling |
+    | rwkv                     |           | NLP       | RNN         | Training  | Language Modeling |
+    | llama                    | Token/sec | NLP       | Transformer | Inference | Generation        |
+    | dlrm                     |           | NLP       |             | Training  | Recommendation    |
+    | convnext_large-fp16      | img/sec   | Vision    | Convolution | Training  | Classification    |
+    | convnext_large-fp32      | img/sec   | Vision    | Convolution | Training  | Classification    |
+    | convnext_large-tf32      | img/sec   | Vision    | Convolution | Training  | Classification    |
+    | convnext_large-tf32-fp16 | img/sec   | Vision    | Convolution | Training  | Classification    |
+    | davit_large              | img/sec   | Vision    | Transformer | Training  | Classification    |
+    | focalnet                 |           | Vision    | Convolution | Training  | Classification    |
+    | davit_large-multi        | img/sec   | Vision    | Transformer | Training  | Classification    |
+    | regnet_y_128gf           | img/sec   | Vision    | Convolution | Training  | Classification    |
+    | resnet152                | img/sec   | Vision    | Convolution | Training  | Classification    |
+    | resnet152-multi          | img/sec   | Vision    | Convolution | Training  | Classification    |
+    | resnet50                 | img/sec   | Vision    | Convolution | Training  | Classification    |
+    | stargan                  | img/sec   | Vision    | Convolution | Training  | GAN               |
+    | super-slomo              | img/sec   | Vision    | Convolution | Training  |                   |
+    | t5                       |           | NLP       | Transformer | Training  |                   |
+    | whisper                  |           | Audio     |             | Training  |                   |
+    +--------------------------+-----------+-----------+-------------+-----------+-------------------+
diff --git a/docs/Welcome/Roadmap.rst b/docs/Welcome/Roadmap.rst
@@ -0,0 +1,10 @@
+Roadmap
+=======
+
+* Cloud CI
+* ROCm 6.0 - MI300 support
+* GPU Max Series - 1550 support
+* Evaluate suitability
+   * Tenstorrent
+   * Graphcore
+   * Cerebras
diff --git a/docs/index.rst b/docs/index.rst
@@ -2,18 +2,42 @@
 Welcome to milabench's documentation!
 =====================================
 
+
+.. toctree::
+   :caption: News
+   :maxdepth: 1
+
+   Welcome/Features
+   Welcome/Roadmap
+   Welcome/Changelog
+
+
 .. toctree::
    :maxdepth: 2
-   :caption: Contents:
+   :caption: Getting Started
+
+   GettingStarted/usage.rst
+   GettingStarted/docker.rst
+
 
-   usage.rst
-   recipes.rst
-   new_benchmarks.rst
+.. toctree::
+   :caption: Contributing
+   :maxdepth: 1
+
+   Contributing/new_benchmarks
+   Contributing/sizer
+   Contributing/dev-usage
+   Contributing/design
+   Contributing/flow
+   Contributing/execution_modes
+   Contributing/recipes
+
+
+.. toctree::
+   :caption: API
+   :maxdepth: 1
 
-   docker.rst
-   dev-usage.rst
-   reference.rst
-   sizer.rst
+   ref-pack.rst
 
 
 Indices and tables
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,6 +8,7 @@ Preparing @@
        * NVIDIA
        * AMD
+       * Intel
 . Create a milabench configuration for your RFP
        Milabench comes with a wide variety of benchmarks.
@@ Expand Down @@