You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've seen a few other open issues related to batch size increases, but this is not related to troubleshooting so I thought it warranted a separate issue. I'm training on multiple GPUs on a HPC and have been messing around with increasing the batch size through editing the nnUNetv2 plans file. I'm seeing linear increase in epoch time, which you'd probably expect due to the number of iterations per batch not changing.
It is a little annoying that the benefits of increasing the batch size are offset by the linearly-increasing total training time - With default settings, my total training time is around 30 hours with a batch size of 2, so scaling to a batch size of 8 results in a total training time of 5 days. Are there any recommendations for how the number of iterations per batch/total number of epochs/learning rate should be altered when using a custom batch size?
The text was updated successfully, but these errors were encountered:
Hi, did you solve this? I also got the same problem when increasing the training batch_size. Although my GPU capacity is enough, it's much slower compared to the smaller batch_size
@sydat2701 No, I've stuck to using default settings for now (but did see a performance boost from increased batch size when using the Residual Encoder instead). If using a high batch size, I'd probably recommend changing some of the default hyperparameter values in the nnUNetTrainer.py file. It could be helpful to lower the number of iterations in the batch and raise learning rate but I haven't had a go yet so let me know if you do this and see better results/training times :)
I've seen a few other open issues related to batch size increases, but this is not related to troubleshooting so I thought it warranted a separate issue. I'm training on multiple GPUs on a HPC and have been messing around with increasing the batch size through editing the nnUNetv2 plans file. I'm seeing linear increase in epoch time, which you'd probably expect due to the number of iterations per batch not changing.
It is a little annoying that the benefits of increasing the batch size are offset by the linearly-increasing total training time - With default settings, my total training time is around 30 hours with a batch size of 2, so scaling to a batch size of 8 results in a total training time of 5 days. Are there any recommendations for how the number of iterations per batch/total number of epochs/learning rate should be altered when using a custom batch size?
The text was updated successfully, but these errors were encountered: