From 25d48b6af1464f278283b07349d087b1c1007e4e Mon Sep 17 00:00:00 2001
From: Sebastian Hoffmann <shoffmann.git@gmail.com>
Date: Mon, 6 Jan 2025 14:56:42 +0100
Subject: [PATCH] chore: README

---
 README.md | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/README.md b/README.md
index 21b655f..c1449a1 100644
--- a/README.md
+++ b/README.md
@@ -9,35 +9,42 @@ A torch library for easy distributed deep learning on HPC clusters. Supports bot
 ## Highlights
 - Simple, yet powerful, API
 - Easy initialization of `torch.distributed`
-- Distributed checkpointing and metrics
+- Distributed metrics
 - Extensive logging and diagnostics
 - Wandb support
+- Tensorboard support
 - A wealth of useful utility functions
 
 ## Installation
 dmlcloud can be installed directly from PyPI:
-```
+```bash
 pip install dmlcloud
 ```
 
 Alternatively, you can install the latest development version directly from Github:
-```
+```bash
 pip install git+https://github.com/sehoffmann/dmlcloud.git
 ```
 
 ## Minimal Example
-See [examples/mnist.py](https://github.com/sehoffmann/dmlcloud/blob/develop/examples/mnist.py) for a minimal and barebone example on how to distributely train MNIST.
-To run it on a single node with 4 GPUs, use
-```
-dmlrun -n 4 python examples/barebone_mnist.py
+See [examples/mnist.py](https://github.com/sehoffmann/dmlcloud/blob/develop/examples/mnist.py) for a minimal example on how to train MNIST with multiple GPUS. To run it with 4 GPUs, use
+```bash
+dmlrun -n 4 python examples/mnist.py
 ```
+`dmlrun` is a thin wrapper around `torchrun` that makes it easier to prototype on a single node.
 
-`dmlrun` is a thin wrapper around `torchrun` that makes development work on a single node easier.
+## Slurm Support
+*dmlcloud* automatically looks for slurm environment variables to initialize torch.distributed. On a slurm cluster, you can hence simply use `srun` from within an sbatch script to train on multiple nodes:
 
+```bash
+#!/bin/bash
+#SBATCH --nodes=2
+#SBATCH --ntasks-per-node=4
+#SBATCH --gpus-per-node=4
+#SBATCH --cpus-per-task=8
+#SBATCH --gpu-bind=none
 
-To run your training across multiple nodes on a slurm cluster instead, you can simply use `srun`:
-```
-srun --ntasks-per-node [NUM_GPUS] python examples/barebone_mnist.py
+srun python examples/mnist.py
 ```
 
 ## Documentation