Question about LAMMPS MACE MD simulation using multiple GPU nodes. #498
Unanswered
turbosonics
asked this question in
Q&A
Replies: 1 comment 5 replies
-
It is possible to do what you want, but note that our documentation says
The reason is that multi-GPU will likely be much slower (for now). If, hearing this, you still want to try for memory reasons, you will need something like
|
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I wrote #487 this article in issue tab about CUDA out of memory crash by 'huge' size geometry, like system with more than 10k number of atoms. Following recommendation from others, I'm fine-tuning a MP-0 'small' model with r_max 4 if that can help the problem.
At the same time, I wonder would it be possible to run such "huge" geometries using MACE MP-0 model from our local cluster environment.
Our local cluster is only equipped 1 GPU per 1 GPU node, and GPU memory for a single GPU is 40GB. With this setup, I can't run that many geometries using MACE MP-0 model from LAMMPS if I use a single GPU node. So, I hope to use multiple GPU nodes.
We are using slurm, so I tried
and executed using
lmp -k on g 2 -sf kk -in MACE.input
But I still see the exact same CUDA OOM crash with a single GPU node case. It seems that MACE LAMMPS does not recognize and use memory provided by multi GPU nodes, instead it still uses the memory of a single GPU node.
Could LAMMPS-MACE utilize more memory resources from multiple GPU nodes?
Beta Was this translation helpful? Give feedback.
All reactions