You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When MPI2 is configured to use the romio321 library for I/O, MPI_File_read_all() fails when reading >=2GB into a single MPI rank.
Issue Workaround
Other MPI2 I/O libraries do not have this limit / bug. Switching to ompio for example resolves the issue on Tursa.
Note: romio321 is currently the recommended MPI2 I/O library on Tursa. Commissioning performance tests were carried out using romio321. I see a performance hit when using ompio (~5 GBPS) instead of romio321 (~10 GBPS) on a single node, but I have not tested to see how this scales.
Sorry to hear you are running into problems with ROMIO from MPICH-3.2.1
The patch which promotes the offending datatype to a 64 bit value is this one: pmodels/mpich@3a479ab0 though it might not be worth backporting to whichever version of OpenMPI you are running: Openmpi has updated their ROMIO to 3.4.1 which should contain the fix.
Git commit
develop HEAD 135808d
Target Platform
University of Edinburgh Extreme Scaling system “Tursa”
Each node: 2 x AMD ROME EPYC 32, Nvidia A100 (40GB), 1TB RAM
Linux tursa-login1 4.18.0-305.10.2.el8_4.x86_64 #1 SMP Mon Jul 12 04:43:18 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
Configure
Attachments
Issue Description
When MPI2 is configured to use the romio321 library for I/O, MPI_File_read_all() fails when reading >=2GB into a single MPI rank.
Issue Workaround
Other MPI2 I/O libraries do not have this limit / bug. Switching to ompio for example resolves the issue on Tursa.
Note: romio321 is currently the recommended MPI2 I/O library on Tursa. Commissioning performance tests were carried out using romio321. I see a performance hit when using ompio (~5 GBPS) instead of romio321 (~10 GBPS) on a single node, but I have not tested to see how this scales.
Minimal reproducer -- MPIRead32.cpp
MPIRead32.cpp https://github.com/mmphys/MPIRead32 is the minimal code to reproduce the issue. Note, this is independent of Grid.
To demonstrate the issue we run the following command on Tursa:
Re-running the same command, but this time choosing the ompio I/O library works around the issue:
Grid reproducer -- GaugeLoad.cpp
The issue was first noticed on Tursa when using Grid to load a Gauge field.
To demonstrate the issue we run the following command on Tursa:
Re-running the same command, but this time choosing the ompio I/O library works around the issue:
config.log
grid.configure.summary.log
GridMakeV1.txt
MPIRead32.cpp.txt
Bad.log
Good.log
GaugeLoad.cpp.txt
GridBad.log
GridGood.log
The text was updated successfully, but these errors were encountered: