32_and_64_bit_indexing_in_Silo

32 and 64 bit size and indexing types in Silo

Be sure also to look at issues #1934 and #1935

Integer sizing in Silo objects and API calls

Various integer arrays in Silo objects and API calls are sized using 32 bit integer types. Consider a DBzonelist object comprising more than about 250,000,000 (250 million) hexahedral zones. Since each zone requires 8 values, the resulting zonelist will have a length, the lnodelist member of a DBzonelist, greater than can be stored in a 32 bit int type.

Integer indexing in Silo objects and API calls

Various Silo objects and associated API calls for writing them involve the use of an integer array whose size is specified as an int (32 bit) type and whose values are then used to index into other arrays. The best example is a DBzonelist object. The integer values in the nodelist member of a DBzonelist object index the coordinate arrays of an associated DBucdmesh object. Another example is a DBmaterial object specifically with mixing materials. Without mixing, the matlist member of a DBmaterial object contains only integer material ids and we do not ever expect the need for more than 32 bits to index all possible materials. However, with mixing materials, the matlist member also includes ones-compilimented (e.g. negative) indices into an integer based linked list of mixing material ids. The language data type used for the size and values of these index arrays in the API calls and in the Silo object types is int.

When would an individual Silo mesh be so large to require 64 bit indexing or sizing?

Ordinarily, we decompose large meshes for parallel processing. When we do, we would not expect any given mesh block to require 64 bit indexing or sizing. The use case is for single node running many hardware threads on a single, large mesh block. A single large mesh block would be less per-thread overhead than one where we have a block per thread. In addition, on newer processors and compilers, 64 bit integer arithematic may in fact be faster than 32 bit arithmetic encouraging developers to favor 64 bit storage regardless of actual mesh block size.

What is the downside to just switching Silo to use 64 bit for indexing and sizing everywhere?

There is a definite downside to using 64 bit indexing and sizing when the objects are small enough for 32 bits. There is a 2x storage cost both for the object’s in memory and in the file. In addition to this space performance cost, there may be speed performance costs on some architectures and/or compilers that optimize for 32 bit integer operations or when 32/64 bit conversions are needed.

Finally, using 64 bit indexing could, potentially halve the memory bandwidth for any array-index bound computation. Why? The index data is twice the size. When data movement tends to dominate overall performance, 64 bit indexing would only exacerbate this problem.

This suggests we would want Silo to support both new, 64 bit objects as well as old/original 32 bit objects and API calls.

How to support both 32 and 64 bit objects in Silo?

The trick is to develop a solution for both 32 and 64 bit variants of Silo objects and API calls to live harmoniously within a single library, a single executable and, as needed, a single Silo file and offer producers and consumers of such objects ways of managing the choice. While a compile-time approach to build a version of Silo that replaces int with a suitable 64 bit type is conceptually simplest and most expedient, a compile-time solution has too many downsides to be workable long term. It might, however, offer a first step in testing and assessing the issues for a final, full implementation.

Proposal

In Silo, there are two places that changes for 64 bit integer types will impact things; the write API (e.g. DBPutZonelist2) and the Silo object structures returned by the read API (e.g. DBzonelist).

It is worth mentioning that a similar issue has already been dealt with in the Silo API for floating point data (e.g. float or double as well as other types) written in such calls as DBPutUcdvar. Here the pointer for arrays of data is a void* type and a datatype argument (or data member in a struct) is used to indicate the type of data the void* pointer points at. In this approach, we wind up with a single function call and struct object and use a type selector argument or data member to indicate the actual type. The alternative is multiple, type-specific implementations of the functions and associated structures.

Lets call these the singular, type selector approach and the multi-, type-specific approach.

Lets consider these options within the context of the example Silo call, DBPutZonelist2…

BPutZonelist2(
    DBfile *dbfile,
    const char *name,
    int nzones,
    int ndims,
    int const *nodelist,
    int lnodelist,
    int origin,
    int lo_offset,
    int hi_offset,
    int const *shapetype,
    int const *shapesize,
    int const *shapecnt,
    int nshapes,
    DBoptlist const *optlist)

Singular, Type-Selector Approach

Since we need to support both 32 and 64 bit sizes, all individual integer-valued arguments in the DBPutZonelist2 would need to be changed to a suitable 64-bit type. Lets call this type db_int64_t. Next, we would need to add a datatype argument to allow the caller to indicate which type is being used. Finally, the arrays of data would all become void* type. These changes would result in a new function signature…

BPutZonelist3(
    DBfile *dbfile,
    const char *name,
    db_int64_t nzones,
    int ndims,
    DBVCP1_t *nodelist,
    db_int64_t lnodelist,
    db_int64_t origin,
    db_int64_t lo_offset,
    db_int64_t hi_offset,
    DBVCP1_t *shapetype,
    DBVCP1_t *shapesize,
    DBVCP1_t *shapecnt,
    db_int64_t nshapes,
    int datatype,
    DBoptlist const *optlist)

where datatype could, in theory, be any of Silo’s integer types (DB_CHAR, DB_SHORT, DB_INT, DB_LONG_LONG). Maybe we will need to introduce DB_INT32 and DB_INT64 types for this to make a little more sense.

How is integer promotion and/or integer implicit conversion going to be handled if callers, for example, pass int types for db_int64_t arguments? This needs to be investigated.

There may be an elegant software engineering solution that would permit us to just change the signature of DBPutZonelist2 such that existing callers (without the datatype argument) would still work correctly. But, probably not. So, this function would then replace DBPutZonelist2 and DBPutZonelist2 would be deprecated.

This singular, type-selector approach is more consistent with the rest of Silo’s API where the application can select a variety of data types for problem sized data.

Multi-, Type-Specific Approach

Here, instead of introducing a new function signature to replace existing functions, we leave the old functions alone and treat them as supporing only 32 bit types and add new functions to handle 64 bit types. So, for the 64 bit variant of DBPutZonelist, we would have…

BPutZonelist264(
    DBfile *dbfile,
    const char *name,
    db_int64_t nzones,
    int ndims,
    db_int64_t const *nodelist,
    db_int64_t lnodelist,
    db_int64_t origin,
    db_int64_t lo_offset,
    db_int64_t hi_offset,
    db_int64_t  const *shapetype,
    db_int64_t  const *shapesize,
    db_int64_t  const *shapecnt,
    db_int64_t nshapes,
    DBoptlist const *optlist)

What about Read-Back?

In the Silo read interface, the caller gets back a struct embodying all the information in an object. If we introduce multiple read methods for the same kind of object only using different types, we introduce a question of object conversion. For example, in the old 32-bit-only Silo, DBGetZonelist returns the old, 32-bit, DBzonelist* and a new DBGetZonelist64 call would return a DBzonelist64*. So, what happens if a caller executes one of these two get calls on an object of the different type? Should it just fail? Should it convert where it can? How might this impact backward compatibility?

If, however, in the Silo interface we wind up returning to caller a new DBzonelist object that includes a datatype member to indicate the underlying type of the associated buffers, then this problem of object conversion goes away. Nonetheless, a caller may be required to inspect the new datatype member of the DBzonelist object before traversing the buffers whereas in older versions of Silo they could always have been assumed to be 32 bit integer buffers.

What about Backward Compatibility?

The issues for producers and consumers are different. For producers, we care about link-time compatibility of the API. We’d prefer not to change the API in a way the breaks old clients. For consumers, a key issue is what happens if they go to read a zonelist object written by a producer as 64-bit even though the actual object could have been supported by a 32 bit implementation of the object? This means that consumers are unable to read objects that they would have otherwise been able to read fine. Likewise for a consumer designed to operate on 64 bit objects, what happens when they go to read objects from older files that are only 32 bit.

API Compatibility

Optimal API compatibility is achieved by not changing any existing types or functions and introducing new types and functions for 64 bit objects. This suggests the multi-, type-specific approach is best for backward compatibility of the API for applications using the older API. However, if the changes necessary to support the new API in applications currently using the old API are not agregious, we should not preclude the single, type-selection approach on this basis alone.

Read-Back Compatibilty

Optimal read-back compatibility is achieved when a caller of the old API can always be guaranteed to successfully read an object that actually fits within the constraints of the old 32-bit API and otherwise fails with a new error code and error message indicating 32/64 bit incompability. Therefore, we will always want to support conversion of 64 bit objects in files to 32 bit objects in read calls if the underlying buffers do indeed fit in 32 bits.

Walk-Through of Singular, Type Selector Approach for Old Clients

So, the more I write up this approach, the more pitfalls I see, especially with impact on older clients.

Writers

DBPutZonelist2 is deprecated
Use of DBPutZonelist2 will work in future version of Silo but will generate deprecation warnings until it is finally removed from the API
DBPutZonelist3 is introduced with above signature
- Likewise for all other Silo API methods that previously assumed only 32 bit integer types.
All existing callers of DBPutZonelist2 have to deal with following
- Pass DB_INT or B_INT32 (proposed) for datatype argument
- May have to deal with int arg promotion or implicit conversion to db_int64_t

Readers

DBzonelist, the struct returned by a DBGetZonelist call is modified in following ways
- All int members specifying size_/_length of buffers become db_int64_t
  - Posible impacts need to be investigated further.
  - May corrupt any integer comparisons or arithmatic involving such members of DBzonelist
- All int* members are changed to void*
  - Any callers referencing buffer values via array-bracket operator [] will fail to compile without casting to int.
- New datatype member indicates which Silo type the buffers are
Library wide property, DBSetIntegerCompatability defaults to DB_RETURN_INT_WHEN_POSSIBLE
- Means that any DBGetXXX call involving possible 32 or 64 bit integer data will always only return 32 bit data, converting where necesary, or fail if not possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly