Skip to content

Commit

Permalink
Some LUMI fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
mvsjober committed Jun 6, 2023
1 parent 0072537 commit c6b5c36
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 9 deletions.
17 changes: 17 additions & 0 deletions lumi-memory-bug.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import torch
import torchvision
import sys

def main(bs):
batch = torch.rand((bs, 3, 32, 32)).cuda()
model = torchvision.models.resnet18().cuda()
print(f'Feeding batch with size {bs} to model..')
model(batch)
print('Done.')

if __name__ == '__main__':
bs = 256
if len(sys.argv) > 1:
bs = int(sys.argv[1])
main(bs)

10 changes: 1 addition & 9 deletions pytorch-ddp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,7 @@ SCRIPT="benchmarks/pytorch_visionmodel_ddp.py"
IMAGENET_DATA=/scratch/dac/data/ilsvrc2012-torch-resized-new.tar

DIST_OPTS="--standalone --master_port 0"
SCRIPT_OPTS="--warmup-steps 100"

#if [ "$LMOD_FAMILY_PYTHON_ML_ENV" != "pytorch" ]
#then
# echo "WARNING: no pytorch module loaded, loading default module"
# module load pytorch
#fi

which python3
SCRIPT_OPTS="--warmup-steps 100 --workers=$SLURM_CPUS_PER_TASK"

if [ "$SLURM_NTASKS" -ne "$SLURM_NNODES" ]; then
echo "ERROR: this script needs to be run as one task per node."
Expand Down

0 comments on commit c6b5c36

Please sign in to comment.