You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We check availability of CUDA-aware MPI as follows:
CUDA_AWARE_MPI = False
# check whether OpenMPI support CUDA-aware MPI
if "openmpi" in os.environ.get("MPI_SUFFIX", "").lower():
buffer = subprocess.check_output(["ompi_info", "--parsable", "--all"])
CUDA_AWARE_MPI = b"mpi_built_with_cuda_support:value:true" in buffer
# MVAPICH
CUDA_AWARE_MPI = CUDA_AWARE_MPI or os.environ.get("MV2_USE_CUDA") == "1"
# MPICH
CUDA_AWARE_MPI = CUDA_AWARE_MPI or os.environ.get("MPIR_CVAR_ENABLE_HCOLL") == "1"
# ParaStationMPI
CUDA_AWARE_MPI = CUDA_AWARE_MPI or os.environ.get("PSP_CUDA") == "1"
On some systems I am using, MPI_SUFFIX is empty, although OpenMPI is installed (and used by Heat). Nevertheless, in that cases one has to set heat.CUDA_AWARE_MPI = True manually as the automatic check does not work.
Questions
is that a bug in our code or a bug in the systems that have empty MPI_SUFFIX?
if the first applies, how to find a catch-all version of our check?
Code snippet triggering the error
Error message or erroneous outcome
Version
main (development branch)
Python version
None
PyTorch version
None
MPI version
The text was updated successfully, but these errors were encountered:
What happened?
We check availability of CUDA-aware MPI as follows:
On some systems I am using,
MPI_SUFFIX
is empty, although OpenMPI is installed (and used by Heat). Nevertheless, in that cases one has to setheat.CUDA_AWARE_MPI = True
manually as the automatic check does not work.Questions
MPI_SUFFIX
?Code snippet triggering the error
Error message or erroneous outcome
Version
main (development branch)
Python version
None
PyTorch version
None
MPI version
The text was updated successfully, but these errors were encountered: