code will get stuck when using ddp #1922
Unanswered
cunangjiang
asked this question in
General
Replies: 6 comments
-
The GPU memory has remaining space, but the GPU utilization is 0. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @cunangjiang, did you check the RAM? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Could you please share the command you are using to run the training? And did you using the latest version? It's not easy to reproduce and figure out the issue. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When using ddp for diff_model_train.py, the code will get stuck at a certain epoch. When I reduce the dataset, the model runs in more epochs. How can i solve this problem? For example, as shown in the figure, the code will remain stuck here without any errors.
Beta Was this translation helpful? Give feedback.
All reactions