Skip to content

Commit

Permalink
Implemented --steps in lightning benchmarch, Puhti results
Browse files Browse the repository at this point in the history
  • Loading branch information
mvsjober committed Sep 18, 2023
1 parent d5d2a07 commit a754ba9
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 2 deletions.
3 changes: 2 additions & 1 deletion benchmarks/pytorch_visionmodel_lightning.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ def train(args):
accelerator='gpu',
strategy='ddp',
precision=precision,
max_steps=args.steps,
callbacks=[BenchmarkingCallback(args.warmup_steps,
args.batchsize,
world_size)])
Expand Down Expand Up @@ -115,7 +116,7 @@ def main():
help='Batch size')
parser.add_argument('-j', '--workers', type=int, default=10,
help='Number of data loader workers')
parser.add_argument('--steps', type=int, required=False,
parser.add_argument('--steps', type=int, required=False, default=-1,
help='Maxium number of training steps')
parser.add_argument('--warmup-steps', type=int, default=10,
help='Number of initial steps to ignore in average')
Expand Down
2 changes: 1 addition & 1 deletion pytorch-ddp-lightning.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ export NCCL_DEBUG=INFO
SCRIPT="benchmarks/pytorch_visionmodel_lightning.py"
IMAGENET_DATA=/scratch/dac/data/ilsvrc2012-torch-resized-new.tar

SCRIPT_OPTS="--strategy=ddp --warmup-steps 100 --workers=$SLURM_CPUS_PER_TASK"
SCRIPT_OPTS="--strategy=ddp --warmup-steps 10 --workers=$SLURM_CPUS_PER_TASK"

if [ $(( $NUM_GPUS * $SLURM_NNODES )) -ne $SLURM_NTASKS ]; then
echo "ERROR: this script needs to be run as one task per GPU. Try using slurm/*-mpi.sh scripts."
Expand Down
16 changes: 16 additions & 0 deletions results.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,19 @@
| DeepSpeed, synthetic data | PyTorch 2.0.0+cu117 | mahti | 8 | 2023-09-15 | 5813.62 |
| Horovod, synthetic | PyTorch 2.0.0+cu117 | mahti | 8 | 2023-09-15 | 5235.30 |
| Horovod, Imagenet data | PyTorch 2.0.0+cu117 | mahti | 8 | 2023-09-15 | 5230.77 |
| DDP, synthetic | PyTorch 2.0.0+cu117 | puhti | 1 | 2023-09-16 | 331.39 |
| DDP, synthetic | PyTorch 2.0.0+cu117 | puhti | 4 | 2023-09-16 | 1245.59 |
| DDP, synthetic | PyTorch 2.0.0+cu117 | puhti | 8 | 2023-09-16 | 2473.86 |
| DDP, synthetic, fp16 | PyTorch 2.0.0+cu117 | puhti | 1 | 2023-09-16 | 674.17 |
| DDP, synthetic, fp16 | PyTorch 2.0.0+cu117 | puhti | 4 | 2023-09-16 | 2389.34 |
| DDP, synthetic, fp16 | PyTorch 2.0.0+cu117 | puhti | 8 | 2023-09-16 | 4644.40 |
| DDP Lightning, synthetic | PyTorch 2.0.0+cu117 | puhti | 1 | 2023-09-16 | 331.98 |
| DDP Lightning, synthetic | PyTorch 2.0.0+cu117 | puhti | 4 | 2023-09-16 | 1254.01 |
| DDP Lightning, synthetic | PyTorch 2.0.0+cu117 | puhti | 8 | 2023-09-16 | 2488.49 |
| DDP, Imagenet data | PyTorch 2.0.0+cu117 | puhti | 1 | 2023-09-16 | 329.76 |
| DDP, Imagenet data | PyTorch 2.0.0+cu117 | puhti | 4 | 2023-09-16 | 1244.49 |
| DDP, Imagenet data | PyTorch 2.0.0+cu117 | puhti | 8 | 2023-09-16 | 2470.56 |
| DeepSpeed, synthetic data | PyTorch 2.0.0+cu117 | puhti | 4 | 2023-09-16 | 1262.18 |
| DeepSpeed, synthetic data | PyTorch 2.0.0+cu117 | puhti | 8 | 2023-09-16 | 2429.24 |
| Horovod, synthetic | PyTorch 2.0.0+cu117 | puhti | 8 | 2023-09-16 | 2314.87 |
| Horovod, Imagenet data | PyTorch 2.0.0+cu117 | puhti | 8 | 2023-09-16 | 2313.93 |

0 comments on commit a754ba9

Please sign in to comment.