-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: SegFault when a MPI rank has no data of a sub-region #3529
base: develop
Are you sure you want to change the base?
Conversation
…n sets of pairs over ranks. If a rank has no element in its set, it no longer results in a crash.
…ey/system-solution-scaling-crash-fix
I'm sure there's a reason I haven't thought about, but why not use Example: https://godbolt.org/z/3sxTeGrzs |
src/coreComponents/physicsSolvers/fluidFlow/CompositionalMultiphaseFVM.cpp
Outdated
Show resolved
Hide resolved
Hi @corbett5!
/* no default get() implementation, please add a template specialization and add it in the "testMpiWrapper" unit test. */
template< typename FIRST, typename SECOND >
MPI_Datatype const mpiPairType;
template<> MPI_Datatype const mpiPairType< float, int > = MPI_FLOAT_INT;
template<> MPI_Datatype const mpiPairType< double, int > = MPI_DOUBLE_INT;
template<> MPI_Datatype const mpiPairType< int, int > = MPI_2INT;
template<> MPI_Datatype const mpiPairType< long int, int > = MPI_LONG_INT;
template<> MPI_Datatype const mpiPairType< long int, long int > = getMpiCustomPairType< long int, long int >();
template<> MPI_Datatype const mpiPairType< long long int, long long int > = getMpiCustomPairType< long long int, long long int >();
template<> MPI_Datatype const mpiPairType< double, long int > = getMpiCustomPairType< double, long int >();
template<> MPI_Datatype const mpiPairType< double, long long int > = getMpiCustomPairType< double, long long int >();
template<> MPI_Datatype const mpiPairType< double, double > = getMpiCustomPairType< double, double >(); |
|
@corbett5 I added a new proposal in the "proposal #1" commit. |
I remember an old discussion about the std::pair layout not being guaranteed to be contiguous in memory which I think is the reason why it was not used in the first place. I don't recall the details of how it was being used though.
Personally I don't have a strong preference between using |
I think the single function is cleaner, but this is in my opinion a small improvement to a minor part of this PR and I think the current implementation is fine. |
This is exactly why I chose the About the |
I finished to write the last comments I think, we may be good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @MelReyCG
GEOS_ERROR_IF_NE( MPI_Op_create( customOpFunc, 1, &mpiOp ), MPI_SUCCESS ); | ||
return mpiOp; | ||
}; | ||
static MPI_Op mpiOp{ createOpHolder() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be static? mpiOp
is returned by value, so it shouldn't matter if it gets destroyed after the function returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The static storage here does matter as MPI_Op
(or MPI_Datatype for custom types) represents a persistent MPI resource that must:
- be initialized only once to avoid mem-leaks (
MPI_X_create()
functions allocates internal resources), - remain valid for the entire MPI lifetime (
MPI_Init
->MPI_Finalize
), - be shared across all calls to this function.
The static construction I used here is a variation of the "Meyer's Singleton", which lazily constructs a unique instance of an object when the resource is first requested (by calling the lambda in this case).
At first I did not mind calling MPI_Op_free()
since we already call MPI_Finalize()
but I found the following in the MPI documentation:
The call to MPI_FINALIZE does not free objects created by MPI calls; these objects are freed using MPI_ XXX_FREE, MPI_COMM_DISCONNECT, or MPI_FILE_CLOSE calls.
To completely manage these resources lifetime I will work a bit more on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh god, yeah I get it. If performance isn't a concern (and I doubt it is) I would simply createa new op every time and free it every time as well. You could do this pretty easily with a RAII object that had a move constructor and deleted copy constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer my approach (in which these objects are create and destroyed on the fly) since it would eliminate the need for static variables, but this is still quite the improvement and I'm not sure it's worth your time to rewrite it.
Closing #3528
My goal here is to solve the crash, and to propose a way to reduce (Min/Max) pairs lexicographically over MPI ranks. That can typically be useful to get the min/max value (pressure, temperature...) along with its
globalIndex
in the mesh (while ensuring that theglobalIndex
will be stable).It could be generalized to tuples, let me know if that could be useful.