Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field get/set functions, type safety, and violating containment #239

Open
wrtobin opened this issue Jul 8, 2019 · 4 comments
Open

Field get/set functions, type safety, and violating containment #239

wrtobin opened this issue Jul 8, 2019 · 4 comments
Assignees

Comments

@wrtobin
Copy link
Collaborator

wrtobin commented Jul 8, 2019

So while working on PR #234 to support complex fields @cwsmith and I discussed the fact that the get/set(Scalar,Vector,Matrix) API functions are not type safe in that a Scalar/Vector/Matrix/Packed field can be passed to any of these functions as a Field*, which is then down-cast to the appropriate type. This conversion is well-defined in the case that the passed in Field* actually does point to a field of the appropriate sub-class and thus with the appropriate 'Value' type.

But static_cast doesn't check that the down-cast is appropriate: dynamic_cast does do this check (and would work in this case because all of the Field sub-classes are polymorphic). So that would be the appropriate change to make. Of course that cast operation uses RTTI to ensure the result is actually a pointer to the correct type, but that adds overhead to these functions, which are definitely tight-loop functions, which I assume is why the static_cast is what is currently used (though the internal call to getNodeValue is still polymorphic so these functions can never be inlined/highly optimized, they will always require at least that one dynamic function dispatch.

In order to no add any call overhead to these functions I will be adding a PCU_DEBUG_ASSERT against the value of field->getValueType() to make things slighty safer and not add any overhead on optimized runs of any downstream codes, not that we ever expect this to be fast.

Now the real weirdness and the real reason for the issue:

All of these getter/setter functions call FieldOf<T>::get/setNodeValue() which takes a pointer to the 'value' of the field, so for SCALAR and PACKED this is just double but for VECTOR and MATRIX it is Vector3 and Matrix3x3 respectively. Inside this function the T* argument is reinterpret_cast<double*> (or const double* for setting). This means the operation being performed is:

Vector3/Matrix3x3 value; // given as input
double * cmps = reinterpret_cast<double*>(value);

So we're casting a pointer to an object to a pointer to a double. The Vector3 and Matrix3x3 classes are both ultimately inherited from can::Array. In the case of Vector3, it makes some sense that the address of the object is also (apparently in general) the address of the array of double array inside of the object. In the Matrix3x3 case the array contains 3 Vector3 objects, each of which I imagine when layed out in memory also just consist of the length 3 double arrays they contain, so this makes some sense too.

However, this casting a pointer to an object to a pointer to the 'first' data member of an object is not covered by the standard and the result in general is undefined behavior. I'm certain the reason we've gotten away with this for so long is that it is the most sensible way for the compiler to put together the actual memory layout of the class, but it isn't defined this way by the standard, so we're making a huge assumption.

While these classes do have an inheritance hierarchy they are not polymorphic, so there is no function lookup table. But when there is a function lookup table, guess were it goes for each object of that class? Right at the start of the object in memory (technically it is implementation-define (like the address of the first data member is also)). So if we were to ever edit the can::Array, mth::Vector, or mth::Matrix classes and introduce a single virtual function (which we never will but nonetheless), this portion of the API instantly breaks.

Further (as noted), this violated encapsulation as we're directly accessing underlying protected data of the class. Granted in this case is is read-only which is slightly more forgivable, but nonetheless is is still an issue.

The only fix I can think of for this issue is to add an explicit cast operator to the Vector class:

explicit operator double*() {return &elems[0]}

this takes care of that issue. The Matrix class is a bigger problem since it is a container of multiple Vectors... I don't really have an elegant solution for this one.

@wrtobin
Copy link
Collaborator Author

wrtobin commented Jul 8, 2019

getScalar/Vector/MatrixNodes(Element,...) also does unsafe static_cast ing of superclass Element to subclasses ElementOf<double/Vector3/Matrix3x3>.

Also ElementOf<T> casts a NewArray<T> (T is double for all Fields, regardless of their valueType, into an array of size 1 of whatever the valueType is in order to retrieve nodal values associated with the element. See apfElementOf.h::32 and apfElementOf.h::37.

So this is what is occurring (in the Vector3 case) to get an array containing the correctly ordered nodal information for an element:

NewArray<double> nodeData(nen*nc); // double array sized to hold all dofs on the closure of the mesh entity, this already exists
NewArray<Vector3> values(nen);
Vector3 * nodeValues = reinterpret_cast<Vector3*>(&nodeData[0]);

So we're casting an array of (for a tet) 12 doubles to an array of 4 Vector3s. This is somewhat the reverse of what we were doing above, but I went and looked in to the standard and there are categories (similar to POD) of class where the pointer to the class is also guaranteed by the standard to be a pointer to the first data member, and so I guess by virtue of that you can also get a pointer to a location in memory that exposes the same underlying memory structure as an object of the class and treat it as though it was an object of the class.

If the classes we use for nodal storage and retrieval is 'standard layout' (which is slightly less strict than being POD, but still has many restrictions), both of these issues I've identified actually do adhere to the standard. However we need to document this and make it explicit in the code. Since C++11 there has been std::is_pod and std::is_standard_layout. I will add these to the template implementation of the appropriate data structures in apf to ensure a developer never tries to get too fancy with field storage types.

See here for some discussion of standard layout versus POD.

@jacobmerson
Copy link
Contributor

A question for the CORE developers (@cwsmith @mortezah) is if we currently allow c++11? @wrtobin I suspect that you would need to at least confirm that these c++ functions are available on BGQ.

wrtobin pushed a commit that referenced this issue Jul 8, 2019
…implementing ability to get Element nodes for a packed field, restricting ElementOf<T> to be standard layout so the underlying data retrieval cast is valid (see issue #239)
@wrtobin
Copy link
Collaborator Author

wrtobin commented Jul 9, 2019

So it turns out one of the variants of the functions to create Elements creates a raw, untyped Element, rather than calling the field which creates an appropriately typed ElementOf. T

This means that the downcast in the get/set-ter functions isn't valid for these elements since they are not of type ElementOf, which should be undefined behavior, but since the only thing it is doing is calling a subclass function to retrieve data that exists in the superclass (doing some of the casting from fundamental types to std::is_standard_layout types) it seems like we're getting away with it.

This also means we cannot assert that we are able to cast the passed in Element to an ElementOf using dynamic_cast which will fail since the RTTI isn't there for the cast to succeed, so we will receive a NULL pointer from the cast.

Since in this case the ElementOf class only really serves as an access abstraction over the underlying array data for the field (whether and array per node (tag) or a frozen field), a valid way to solve this would involve modifying the Element class hierarchy, removing ElementOf and refactoring the concept into a field array valueType access abstraction functor to actually retrieve the underlying field data in the preferred form, since we can get/set field values for a field which 9 or more doubles per node using any of the get/set-ter functions in a valid way. So a Packed field with >=9 components is also a Matrix3x3 field is also a Vector3 field is also a Scalar field, the only change is how much of the underlying array we are accessing and all of the functional niceties associated with each of the field valueTypes.

This isn't going to happen any time soon though, so for now we either (1) can't have type protection on the get/set-ter functions, or (2) we need to require that fields implement both element constructor functions so we never create raw Elements (except in PACKED case I think..).

I'm going to exclude this from my work on complex -- which is very close to being ready to merge -- and leave it for later.

@jacobmerson
Copy link
Contributor

jacobmerson commented Jun 5, 2022

Came across this again in the code...

It's really bad that we are reinterpret casting Matrix/Vector, etc classes to double * in the FieldOf getNodeValue and setNodeValue. This is certainly UB and it's surprising that this hasn't broken anything yet.

For the type safety issue @wrtobin brought up we should profile how bad dynamic_cast is. If it's too much overhead we can at least compare against Field getValueType.

@mortezah you can assign me this issue. I'll put together the fix/timing hopefully sometime this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants