You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The -f/-l options are used to restrict the number of tasks mdtest runs to smaller subsets of the total size specified via the job MPI parameters.
In this mode a subset of the ranks will not participate in the test, and those ranks have to be managed properly so they join up with the ranks that did at the end.
The recently refactored logic fixed one issue but created another in the corner case of size > first == last. In this scenario only one rank participates in the test, but all ranks are duping MPI_COMM_WORLD and the barrier behavior is not correct for this scenario resulting in a hang.
One solution to this involves making the logic common and ensuring that any ranks which aren't participating are handled in the same manner as they are in
The -f/-l options are used to restrict the number of tasks mdtest runs
to smaller subsets of the total size specified via the job MPI
parameters.
In this mode a subset of the ranks will not participate in the test, and
those ranks have to be managed properly so they join up with the ranks
that did at the end.
The recently refactored logic fixed one issue but created another in the
corner case of size > first == last. In this scenario only one rank
participates in the test, but all ranks were duping MPI_COMM_WORLD and
the barrier behavior was not correct for this scenario resulting in a
hang.
This solves the problem by making the logic common (a new group and
communicator will always be created for the test whether it is for all
ranks or a subset) and ensuring that any ranks which aren't
participating are handled in the same manner as in ior.c:117.
The
-f
/-l
options are used to restrict the number of tasks mdtest runs to smaller subsets of the total size specified via the job MPI parameters.In this mode a subset of the ranks will not participate in the test, and those ranks have to be managed properly so they join up with the ranks that did at the end.
The recently refactored logic fixed one issue but created another in the corner case of size > first == last. In this scenario only one rank participates in the test, but all ranks are duping MPI_COMM_WORLD and the barrier behavior is not correct for this scenario resulting in a hang.
The relevant code is here:
One solution to this involves making the logic common and ensuring that any ranks which aren't participating are handled in the same manner as they are in
ior/src/ior.c
Line 117 in 9f97b10
The text was updated successfully, but these errors were encountered: