Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This is, modulo a massive and possibly un-needed re-organization and some bug-fixes (the first 5 commits are not strictly needed for this work), the implementation of #214 .
The consensus I reached in #214 is to use the numpy
dtype.str
anddtype.descr
as 2 additional keys which gives us enough information to identify both "built in" types and structured types using a pre-existing scheme. This was picked over the PEP3118 string formatting due to the wider adoption and better documentation of the numpy scheme over the pep scheme. 2 keys was chosen over 1 key of variably type to avoid the type instability. There may be a case that the descr field should be extra optional (we must have'dtype'
, we may have a'dtype_str'
and if we have a'dtype_str'
we may also have a'dtype_descr'
).The rules for getting back to the numpy dtype is:
'V'
, thendt = np.dtype(dk['dtype_str'])
'V'
, thendt = np.dtype(dk['dtype_descr'])
which is fiddly, but I think acceptable. It may be possible to get more inside the head of
np.dtype
and pass some function in numpy both the str and the descr and let it sort things out, but I have not found that function yet.There is more information in the
__aray_protorol__
bundle, like the offsets or padding, that we are not capturing here because that is a hardware dependent detail and not machine invariant structure. That is, from the point of view of the event model[('a', 'u1'), ('b', 'f8')]
with the float align to the byte boundary or to the 8 byte boundary are "the same". Describing the exact in-memory layout should be left to a library (like tiled!) that handles serialization / communication between processes.Related, given the above discussion one could argue that we should be dropping the endianness of the data (as that is the poster-child for machine dependent details!), but I think the cost of carrying around a bit of "too detailed" information is an acceptable cost of not having to invent and describe a variation on the numpy scheme that ignores the endianness.
Motivation and Context
Closes #214
How Has This Been Tested?
Docs
Need to edit and migrate my ranting it #214 to the docs.
cross project work
.describe
. This should be back-compatible as the model did not say data_keys was not allowed to have additional keys so any thing consuming them should already be able to ignore the extra keys (but not 0 risk)