From db84465f4a2fcb8e42e5692ed927a4761995181c Mon Sep 17 00:00:00 2001 From: Allen Byrne <50328838+byrnHDF@users.noreply.github.com> Date: Fri, 22 Nov 2024 14:40:29 -0600 Subject: [PATCH] Convert vfl html file to doxygen (#5140) --- doxygen/dox/TechnicalNotes.dox | 1021 +++++++++++++++++++- doxygen/examples/VFL.html | 1598 -------------------------------- 2 files changed, 1020 insertions(+), 1599 deletions(-) delete mode 100644 doxygen/examples/VFL.html diff --git a/doxygen/dox/TechnicalNotes.dox b/doxygen/dox/TechnicalNotes.dox index 61ddfbd3398..199d17de129 100644 --- a/doxygen/dox/TechnicalNotes.dox +++ b/doxygen/dox/TechnicalNotes.dox @@ -32,7 +32,1026 @@ /** \page VFL HDF5 Virtual File Layer -\htmlinclude VFL.html +\section sec_vfl_intro Introduction +The HDF5 file format describes how HDF5 data structures and dataset raw data are mapped +to a linear format address space and the HDF5 library implements that bidirectional mapping +in terms of an API. However, the HDF5 format specifications do not indicate how the format +address space is mapped onto storage and HDF (version 5 and earlier) simply mapped the format +address space directly onto a single file by convention. + +Since early versions of HDF5 it became apparent that users want the ability to map the +format address space onto different types of storage (a single file, multiple files, local +memory, global memory, network distributed global memory, a network protocol, etc.) with +various types of maps. For instance, some users want to be able to handle very large format +address spaces on operating systems that support only 2GB files by partitioning the format +address space into equal-sized parts each served by a separate file. Other users want the +same multi-file storage capability but want to partition the address space according to +purpose (raw data in one file, object headers in another, global heap in a third, etc.) +in order to improve I/O speeds. + +In fact, the number of storage variations is probably larger than the number of methods +that the HDF5 team is capable of implementing and supporting. Therefore, a Virtual File +Layer API is being implemented which will allow application teams or departments to design +and implement their own mapping between the HDF5 format address space and storage, with each +mapping being a separate file driver (possibly written in terms of other file drivers). The +HDF5 team will provide a small set of useful file drivers which will also serve as examples +for those who which to write their own: +
#H5FD_SEC2 | This is the default driver which uses Posix file-system functions +like read and write to perform I/O to a single file. All I/O requests are unbuffered +although the driver does optimize file seeking operations to some extent. + | +
#H5FD_STDIO | This driver uses functions from 'stdio.h' to perform buffered I/O to a single file. + | +
#H5FD_CORE | This driver performs I/O directly to memory and can be +used to create small temporary files that never exist on permanent storage. This +type of storage is generally very fast since the I/O consists only of memory-to-memory copy operations. + | +
#H5FD_MPIO | This is the driver of choice for accessing files in parallel +using MPI and MPI-IO. It is only predefined if the library is compiled with parallel I/O support. + | +
#H5FD_FAMILY | Large format address spaces are partitioned into more +manageable pieces and sent to separate storage locations using an underlying driver +of the user's choice. \ref H5TOOL_RT_UG can be used to change the sizes of the family +members when stored as files or to convert a family of files to a single file or vice versa. + | +
static H5FD_t * open (const char *name, unsigned flags, hid_t fapl, haddr_t maxaddr) |
+The file name name and file access property list fapl are the same as were specified in the #H5Fcreate +or #H5Fopen call. The flags are the same as in those calls also except the flag #H5F_ACC_CREAT is also +present if the call was to H5Fcreate and they are documented in the 'H5Fpublic.h' file. The maxaddr +argument is the maximum format address that the driver should be prepared to handle (the minimum address is always zero). | +
static herr_t close (H5FD_t *file) |
+The file argument is the handle which was returned by the open function, and the close should +free only memory associated with the driver-specific part of the handle (the public parts will +have already been released by HDF5's virtual file layer). | +
const int cmp (const H5FD_t *f1, const H5FD_t *f2) |
+The driver may provide a function which compares two files f1 and f2 belonging to the same +driver and returns a negative, positive, or zero value a la the strcmp function.(The ordering +is arbitrary as long as it's consistent within a particular file driver.) If this function is +not provided then HDF5 assumes that all calls to the open callback return unique files regardless +of the arguments and it is up to the application to avoid doing this if that assumption is incorrect. | +
Function | +Description | +
---|---|
static hsize_t sb_size (H5FD_t *file) |
+The sb_size function returns the number of bytes necessary to encode +information needed later if the file is reopened. | +
static herr_t sb_encode (H5FD_t *file, char *name, unsigned char *buf) |
+The sb_encode function encodes information from the file into buffer buf +allocated by the caller. It also writes an 8-character (plus null termination) into +the name argument, which should be a unique identification for the driver. | +
static herr_t sb_decode (H5FD_t *file, const char *name, const unsigned char *buf) |
+The sb_decode function looks at the name decodes data from the buffer buf and +updates the file argument with the new information, advancing *p in the process. | +
#H5FD_MEM_SUPER | userblock | +
#H5FD_MEM_BTREE | An allocation request for a node of a B-tree. + | +
#H5FD_MEM_DRAW | An allocation request for the raw data of a dataset. + | +
#H5FD_MEM_GHEAP | An allocation request for a global heap collection. Global +heaps are used to store certain types of references such as dataset region references. +The set of all global heap collections can become quite large. + | +
#H5FD_MEM_LHEAP | An allocation request for a local heap. Local heaps are used +to store the names which are members of a group. The combined size of all local heaps is +a function of the number of object names in the file. + | +
#H5FD_MEM_OHDR | An allocation request for (part of) an object header. Object +headers are relatively small and include meta information about objects (like the data +space and type of a dataset) and attributes. + | +
#H5FD_FLMAP_SINGLE | All memory usage types are mapped to a single free list. + | +
#H5FD_FLMAP_DICHOTOMY | Memory usage is segregated into meta data and raw data +for the purposes of memory management. + | +
#H5FD_FLMAP_DEFAULT | Each memory usage type has its own free list. + | +
static haddr_t alloc (H5FD_t *file, H5MF_type_t type, hsize_t size) |
+The file argument is the file from which space is to be allocated, type is the type of +memory being requested (from the list above) without being mapped according to the freelist +map and size is the number of bytes being requested. The library is allowed to allocate large +chunks of storage and manage them in a layer above the file driver (although the current library +doesn't do that). The allocation function should return a format address for the first byte +allocated. The allocated region extends from that address for size bytes. If the request cannot +be honored then the undefined address value is returned (#HADDR_UNDEF). The first call to this +function for a file which has never had memory allocated must return a format address of zero +or #HADDR_UNDEF since this is how the library allocates space for the userblock and/or superblock. | +
static herr_t free (H5FD_t *file, H5MF_type_t type, haddr_t addr, hsize_t size) |
+The file argument is the file for which space is being freed; type is the type of object being +freed (from the list above) without being mapped according to the freelist map; addr is the first +format address to free; and size is the size in bytes of the region being freed. The region being +freed may refer to just part of the region originally allocated and/or may cross allocation boundaries +provided all regions being freed have the same usage type. However, the library will never attempt +to free regions which have already been freed or which have never been allocated. | +
static haddr_t get_eoa (H5FD_t *file) |
+This function returns the current value of the EOA marker for the specified file. | +
static herr_t read (H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, void *buf) |
+The read function reads data from file file beginning at address addr and continuing +for size bytes into the buffer buf supplied by the caller. | +
static herr_t write (H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, const void *buf) |
+The write function transfers data +in the opposite direction. | +
static herr_t flush (H5FD_t *file) |
+Flush all data for file file to storage. | +
static herr_t query (const H5FD_t *file, unsigned long *flags) |
+This function is called by the library to query which optimizations to enable for I/O to this driver. | +
H5FD_FEAT_AGGREGATE_METADATA (0x00000001) |
+Defining the H5FD_FEAT_AGGREGATE_METADATA for a VFL driver means that the library will attempt to allocate +a larger block for metadata and then sub-allocate each metadata request from that larger block. | +
H5FD_FEAT_ACCUMULATE_METADATA (0x00000002) |
+Defining the H5FD_FEAT_ACCUMULATE_METADATA for a VFL driver means that the library will attempt to cache +metadata as it is written to the file and build up a larger block of metadata to eventually pass to the +VFL 'write' routine. | +
H5FD_FEAT_DATA_SIEVE (0x00000004) |
+Defining the H5FD_FEAT_DATA_SIEVE for a VFL driver means that the library will attempt to cache raw data + as it is read from/written to a file in a "data sieve" buffer. | +
hid_t H5FDregister (H5FD_class_t *cls) |
+The driver described by struct cls is registered with the library and an ID number for the driver is returned. | +
const char *name |
+A pointer to a constant, null-terminated driver name to be used for debugging purposes. | +
size_t fapl_size |
+The size in bytes of the file access mode structure or zero if the driver supplies a copy function +or doesn't define the structure. | +
void *(*fapl_copy)(const void *fapl) |
+An optional function which copies a driver-defined file access mode structure. This field takes +precedence over fm_size when both are defined. | +
void (*fapl_free)(void *fapl) |
+An optional function to free the driver-defined file access mode structure. If null, then the +library calls the C free function to free the structure. | +
size_t dxpl_size |
+The size in bytes of the data transfer mode structure or zero if the driver supplies a copy +function or doesn't define the structure. | +
void *(*dxpl_copy)(const void *dxpl) |
+An optional function which copies a driver-defined data transfer mode structure. This field +takes precedence over xm_size when both are defined. | +
void (*dxpl_free)(void *dxpl) |
+An optional function to free the driver-defined data transfer mode structure. If null, then +the library calls the C free function to free the structure. | +
H5FD_t *(*open)(const char *name, unsigned flags, hid_t fapl, haddr_t maxaddr) |
+The function which opens or creates a new file. | +
herr_t (*close)(H5FD_t *file) |
+The function which ends access to a file. | +
int (*cmp)(const H5FD_t *f1, const H5FD_t *f2) |
+An optional function to determine whether two open files have the same key. If this function +is not present then the library assumes that two files will never be the same. | +
int (*query)(const H5FD_t *f, unsigned long *flags) |
+An optional function to determine which library optimizations a driver can support. | +
haddr_t (*alloc)(H5FD_t *file, H5FD_mem_t type, hsize_t size) |
+An optional function to allocate space in the file. | +
herr_t (*free)(H5FD_t *file, H5FD_mem_t type, haddr_t addr, hsize_t size) |
+An optional function to free space in the file. | +
haddr_t (*get_eoa)(H5FD_t *file) |
+A function to query how much of the format address space has been allocated. | +
herr_t (*set_eoa)(H5FD_t *file, haddr_t) |
+A function to set the end of address space. | +
haddr_t (*get_eof)(H5FD_t *file) |
+A function to return the current end-of-file marker value. | +
herr_t (*read)(H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, void *buffer) |
+A function to read data from a file. | +
herr_t (*write)(H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, const void *buffer) |
+A function to write data to a file. | +
herr_t (*flush)(H5FD_t *file) |
+A function which flushes cached data to the file. | +
H5FD_mem_t fl_map[H5FD_MEM_NTYPES] |
+An array which maps a file allocation request type to a free list. | +
herr_t H5Dunregister (hid_t driver) |
+Where driver is the ID number returned when the driver was registered. | +
void * H5Pget_driver_data (hid_t fapl) |
+This function is intended to be used by driver functions, not applications. It returns a pointer +directly into the file access property list fapl which is a copy of the driver's file access mode +originally provided to the H5Pset_driver function. If its argument is a data transfer property list +fxpl then it returns a pointer to the driver-specific data transfer information instead. + | +
Initial document, 18 November 1999.
- -Updated on 10/24/00, Quincey Koziol
- -Added the section “Programming Note for C++ Developers Using C -Functions,” 08/23/2012, Mark Evans - - - -
-
-
- - -
-The HDF5 file format describes how HDF5 data structures and dataset raw -data are mapped to a linear format address space and the HDF5 -library implements that bidirectional mapping in terms of an -API. However, the HDF5 format specifications do not indicate how -the format address space is mapped onto storage and HDF (version 5 and -earlier) simply mapped the format address space directly onto a single -file by convention. - -
--Since early versions of HDF5 it became apparent that users want the ability to -map the format address space onto different types of storage (a single file, -multiple files, local memory, global memory, network distributed global -memory, a network protocol, etc.) with various types of maps. For -instance, some users want to be able to handle very large format address -spaces on operating systems that support only 2GB files by partitioning the -format address space into equal-sized parts each served by a separate -file. Other users want the same multi-file storage capability but want to -partition the address space according to purpose (raw data in one file, object -headers in another, global heap in a third, etc.) in order to improve I/O -speeds. - -
--In fact, the number of storage variations is probably larger than the -number of methods that the HDF5 team is capable of implementing and -supporting. Therefore, a Virtual File Layer API is being -implemented which will allow application teams or departments to design -and implement their own mapping between the HDF5 format address space -and storage, with each mapping being a separate file driver -(possibly written in terms of other file drivers). The HDF5 team will -provide a small set of useful file drivers which will also serve as -examples for those who which to write their own: - -
-H5FD_SEC2
-read
and write
to perform I/O to a single file. All I/O
-requests are unbuffered although the driver does optimize file seeking
-operations to some extent.
-
-H5FD_STDIO
-H5FD_CORE
-H5FD_MPIIO
-H5FD_FAMILY
-h5repart
tool can be used to change the sizes of the
-family members when stored as files or to convert a family of files to a
-single file or vice versa.
-
-H5FD_SPLIT
--Most application writers will use a driver defined by the HDF5 library or -contributed by another programming team. This chapter describes how existing -drivers are used. - -
- - - --Each file driver is defined in its own public header file which should -be included by any application which plans to use that driver. The -predefined drivers are in header files whose names begin with -`H5FD' followed by the driver name and `.h'. The `hdf5.h' -header file includes all the predefined driver header files. - -
-
-Once the appropriate header file is included a symbol of the form
-`H5FD_' followed by the upper-case driver name will be the driver
-identification number.(1) However, the
-value may change if the library is closed (e.g., by calling
-H5close
) and the symbol is referenced again.
-
-
-In order to create or open a file one must define the method by which the
-storage is accessed(2) and does so by creating a file access property list(3) which is passed to the H5Fcreate
or
-H5Fopen
function. A default file access property list is created by
-calling H5Pcreate
and then the file driver information is inserted by
-calling a driver initialization function such as H5Pset_fapl_family
:
-
-
-hid_t fapl = H5Pcreate(H5P_FILE_ACCESS); -size_t member_size = 100*1024*1024; /*100MB*/ -H5Pset_fapl_family(fapl, member_size, H5P_DEFAULT); -hid_t file = H5Fcreate("foo%05d.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl); -H5Pclose(fapl); -- -
-Each file driver will have its own initialization function
-whose name is H5Pset_fapl_
followed by the driver name and which
-takes a file access property list as the first argument followed by
-additional driver-dependent arguments.
-
-
-An alternative to using the driver initialization function is to set the
-driver directly using the H5Pset_driver
function.(4) Its second argument is the file driver identifier, which may
-have a different numeric value from run to run depending on the order in which
-the file drivers are registered with the library. The third argument
-encapsulates the additional arguments of the driver initialization
-function. This method only works if the file driver writer has made the
-driver-specific property list structure a public datatype, which is
-often not the case.
-
-
-hid_t fapl = H5Pcreate(H5P_FILE_ACCESS); -static H5FD_family_fapl_t fa = {100*1024*1024, H5P_DEFAULT}; -H5Pset_driver(fapl, H5FD_FAMILY, &fa); -hid_t file = H5Fcreate("foo.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl); -H5Pclose(fapl); -- -
-It is also possible to query the file driver information from a file access
-property list by calling H5Pget_driver
to determine the driver and then
-calling a driver-defined query function to obtain the driver information:
-
-
-hid_t driver = H5Pget_driver(fapl); -if (H5FD_SEC2==driver) { - /*nothing further to get*/ -} else if (H5FD_FAMILY==driver) { - hid_t member_fapl; - haddr_t member_size; - H5Pget_fapl_family(fapl, &member_size, &member_fapl); -} else if (....) { - .... -} -- - - -
-The H5Dread
and H5Dwrite
functions transfer data between
-application memory and the file. They both take an optional data transfer
-property list which has some general driver-independent properties and
-optional driver-defined properties. An application will typically perform I/O
-in one of three styles via the H5Dread
or H5Dwrite
function:
-
-
-Like file access properties in the previous section, data transfer properties -can be set using a driver initialization function or a general purpose -function. For example, to set the MPI-IO driver to use independent access for -I/O operations one would say: - -
- --hid_t dxpl = H5Pcreate(H5P_DATA_XFER); -H5Pset_dxpl_mpio(dxpl, H5FD_MPIO_INDEPENDENT); -H5Dread(dataset, type, mspace, fspace, buffer, dxpl); -H5Pclose(dxpl); -- -
-The alternative is to initialize a driver defined C struct
and pass it
-to the H5Pset_driver
function:
-
-
-hid_t dxpl = H5Pcreate(H5P_DATA_XFER); -static H5FD_mpio_dxpl_t dx = {H5FD_MPIO_INDEPENDENT}; -H5Pset_driver(dxpl, H5FD_MPIO, &dx); -H5Dread(dataset, type, mspace, fspace, buffer, dxpl); -- -
-The transfer property list can be queried in a manner similar to the file -access property list: the driver provides a function (or functions) to return -various information about the transfer property list: - -
- --hid_t driver = H5Pget_driver(dxpl); -if (H5FD_MPIO==driver) { - H5FD_mpio_xfer_t xfer_mode; - H5Pget_dxpl_mpio(dxpl, &xfer_mode); -} else { - .... -} -- - - -
-The HDF5 specifications describe two things: the mapping of data onto a linear -format address space and the C API which performs the mapping. -However, the mapping of the format address space onto storage intentionally -falls outside the scope of the HDF5 specs. This is a direct result of the fact -that it is not generally possible to store information about how to access -storage inside the storage itself. For instance, given only the file name -`/arborea/1225/work/f%03d' the HDF5 library is unable to tell whether the -name refers to a file on the local file system, a family of files on the local -file system, a file on host `arborea' port 1225, a family of files on a -remote system, etc. - -
--Two ways which library could figure out where the storage is located are: -storage access information can be provided by the user, or the library can try -all known file access methods. This implementation uses the former method. - -
--In general, if a file was created with one driver then it isn't possible to -open it with another driver. There are of course exceptions: a file created -with MPIO could probably be opened with the sec2 driver, any file created -by the sec2 driver could be opened as a family of files with one member, -etc. In fact, sometimes a file must not only be opened with the same -driver but also with the same driver properties. The predefined drivers are -written in such a way that specifying the correct driver is sufficient for -opening a file. - -
- - --A driver is simply a collection of functions and data structures which are -registered with the HDF5 library at runtime. The functions fall into these -categories: - -
- --Some drivers need information about file access and data transfers which are -very specific to the driver. The information is usually implemented as a pair -of pointers to C structs which are allocated and initialized as part of an -HDF5 property list and passed down to various driver functions. There are two -classes of settings: file access modes that describe how to access the file -through the driver, and data transfer modes which are settings that control -I/O operations. Each file opened by a particular driver may have a different -access mode; each dataset I/O request for a particular file may have a -different data transfer mode. - -
--Since each driver has its own particular requirements for various settings, -each driver is responsible for defining the mode structures that it -needs. Higher layers of the library treat the structures as opaque but must be -able to copy and free them. Thus, the driver provides either the size of the -structure or a pair of function pointers for each of the mode types. - -
--Example: The family driver needs to know how the format address -space is partitioned and the file access property list to use for the -family members. - -
- --/* Driver-specific file access properties */ -typedef struct H5FD_family_fapl_t { - hsize_t memb_size; /*size of each member */ - hid_t memb_fapl_id; /*file access property list of each memb*/ -} H5FD_family_fapl_t; - -/* Driver specific data transfer properties */ -typedef struct H5FD_family_dxpl_t { - hid_t memb_dxpl_id; /*data xfer property list of each memb */ -} H5FD_family_dxpl_t; -- -
-In order to copy or free one of these structures the member file access -or data transfer properties must also be copied or freed. This is done -by providing a copy and close function for each structure: - -
--Example: The file access property list copy and close functions -for the family driver: - -
- --static void * -H5FD_family_fapl_copy(const void *_old_fa) -{ - const H5FD_family_fapl_t *old_fa = (const H5FD_family_fapl_t*)_old_fa; - H5FD_family_fapl_t *new_fa = malloc(sizeof(H5FD_family_fapl_t)); - assert(new_fa); - - memcpy(new_fa, old_fa, sizeof(H5FD_family_fapl_t)); - new_fa->memb_fapl_id = H5Pcopy(old_fa->memb_fapl_id); - return new_fa; -} - -static herr_t -H5FD_family_fapl_free(void *_fa) -{ - H5FD_family_fapl_t *fa = (H5FD_family_fapl_t*)_fa; - H5Pclose(fa->memb_fapl_id); - free(fa); - return 0; -} -- -
-Generally when a file is created or opened the file access properties
-for the driver are copied into the file pointer which is returned and
-they may be modified from their original value (for instance, the file
-family driver modifies the member size property when opening an existing
-family). In order to support the H5Fget_access_plist
function the
-driver must provide a fapl_get
callback which creates a copy of
-the driver-specific properties based on a particular file.
-
-
-Example: The file family driver copies the member size file -access property list into the return value: - -
- --static void * -H5FD_family_fapl_get(H5FD_t *_file) -{ - H5FD_family_t *file = (H5FD_family_t*)_file; - H5FD_family_fapl_t *fa = calloc(1, sizeof(H5FD_family_fapl_t*)); - - fa->memb_size = file->memb_size; - fa->memb_fapl_id = H5Pcopy(file->memb_fapl_id); - return fa; -} -- - - -
-The higher layers of the library expect files to have a name and allow the
-file to be accessed in various modes. The driver must be able to create a new
-file, replace an existing file, or open an existing file. Opening or creating
-a file should return a handle, a pointer to a specialization of the
-H5FD_t
struct, which allows read-only or read-write access and which
-will be passed to the other driver functions as they are
-called.(5)
-
-
-typedef struct { - /* Public fields */ - H5FD_class_t *cls; /*class data defined below*/ - - /* Private fields -- driver-defined */ - -} H5FD_t; -- -
-Example: The family driver requires handles to the underlying
-storage, the size of the members for this particular file (which might be
-different than the member size specified in the file access property list if
-an existing file family is being opened), the name used to open the file in
-case additional members must be created, and the flags to use for creating
-those additional members. The eoa
member caches the size of the format
-address space so the family members don't have to be queried in order to find
-it.
-
-
-/* The description of a file belonging to this driver. */ -typedef struct H5FD_family_t { - H5FD_t pub; /*public stuff, must be first */ - hid_t memb_fapl_id; /*file access property list for members */ - hsize_t memb_size; /*maximum size of each member file */ - int nmembs; /*number of family members */ - int amembs; /*number of member slots allocated */ - H5FD_t **memb; /*dynamic array of member pointers */ - haddr_t eoa; /*end of allocated addresses */ - char *name; /*name generator printf format */ - unsigned flags; /*flags for opening additional members */ -} H5FD_family_t; -- -
-Example: The sec2 driver needs to keep track of the underlying Unix
-file descriptor and also the end of format address space and current Unix file
-size. It also keeps track of the current file position and last operation
-(read, write, or unknown) in order to optimize calls to lseek
. The
-device
and inode
fields are defined on Unix in order to uniquely
-identify the file and will be discussed below.
-
-
-typedef struct H5FD_sec2_t { - H5FD_t pub; /*public stuff, must be first */ - int fd; /*the unix file */ - haddr_t eoa; /*end of allocated region */ - haddr_t eof; /*end of file; current file size*/ - haddr_t pos; /*current file I/O position */ - int op; /*last operation */ - dev_t device; /*file device number */ - ino_t inode; /*file i-node number */ -} H5FD_sec2_t; -- - - -
-All drivers must define a function for opening/creating a file. This -function should have a prototype which is: - -
--
-The file name name and file access property list fapl are
-the same as were specified in the H5Fcreate
or H5Fopen
-call. The flags are the same as in those calls also except the
-flag H5F_ACC_CREATE
is also present if the call was to
-H5Fcreate
and they are documented in the `H5Fpublic.h'
-file. The maxaddr argument is the maximum format address that the
-driver should be prepared to handle (the minimum address is always
-zero).
-
-Example: The sec2 driver opens a Unix file with the requested name -and saves information which uniquely identifies the file (the Unix device -number and inode). - -
- --static H5FD_t * -H5FD_sec2_open(const char *name, unsigned flags, hid_t fapl_id/*unused*/, - haddr_t maxaddr) -{ - unsigned o_flags; - int fd; - struct stat sb; - H5FD_sec2_t *file=NULL; - - /* Check arguments */ - if (!name || !*name) return NULL; - if (0==maxaddr || HADDR_UNDEF==maxaddr) return NULL; - if (ADDR_OVERFLOW(maxaddr)) return NULL; - - /* Build the open flags */ - o_flags = (H5F_ACC_RDWR & flags) ? O_RDWR : O_RDONLY; - if (H5F_ACC_TRUNC & flags) o_flags |= O_TRUNC; - if (H5F_ACC_CREAT & flags) o_flags |= O_CREAT; - if (H5F_ACC_EXCL & flags) o_flags |= O_EXCL; - - /* Open the file */ - if ((fd=open(name, o_flags, 0666))<0) return NULL; - if (fstat(fd, &sb)<0) { - close(fd); - return NULL; - } - - /* Create the new file struct */ - file = calloc(1, sizeof(H5FD_sec2_t)); - file->fd = fd; - file->eof = sb.st_size; - file->pos = HADDR_UNDEF; - file->op = OP_UNKNOWN; - file->device = sb.st_dev; - file->inode = sb.st_ino; - - return (H5FD_t*)file; -} -- - - -
-Closing a file simply means that all cached data should be flushed to the next -lower layer, the file should be closed at the next lower layer, and all -file-related data structures should be freed. All information needed by the -close function is already present in the file handle. - -
--
-The file argument is the handle which was returned by the open
-function, and the close
should free only memory associated with the
-driver-specific part of the handle (the public parts will have already been released by HDF5's virtual file layer).
-
-Example: The sec2 driver just closes the underlying Unix file, -making sure that the actual file size is the same as that known to the -library by writing a zero to the last file position it hasn't been -written by some previous operation (which happens in the same code which -flushes the file contents and is shown below). - -
- --static herr_t -H5FD_sec2_close(H5FD_t *_file) -{ - H5FD_sec2_t *file = (H5FD_sec2_t*)_file; - - if (H5FD_sec2_flush(_file)<0) return -1; - if (close(file->fd)<0) return -1; - free(file); - return 0; -} -- - - -
-Occasionally an application will attempt to open a single file more than one -time in order to obtain multiple handles to the file. HDF5 allows the files to -share information(6) but in order to -accomplish this HDF5 must be able to tell when two names refer to the same -file. It does this by associating a driver-defined key with each file opened -by a driver and comparing the key for an open request with the keys for all -other files currently open by the same driver. - -
--
-The driver may provide a function which compares two files f1 and
-f2 belonging to the same driver and returns a negative, positive, or
-zero value a la the strcmp
function.(7) If this
-function is not provided then HDF5 assumes that all calls to the open
-callback return unique files regardless of the arguments and it is up to the
-application to avoid doing this if that assumption is incorrect.
-
-Each time a file is opened the library calls the cmp
function to
-compare that file with all other files currently open by the same driver and
-if one of them matches (at most one can match) then the file which was just
-opened is closed and the previously opened file is used instead.
-
-
-Opening a file twice with incompatible flags will result in failure. For -instance, opening a file with the truncate flag is a two step process which -first opens the file without truncation so keys can be compared, and if no -matching file is found already open then the file is closed and immediately -reopened with the truncation flag set (if a matching file is already open then -the truncating open will fail). - -
--Example: The sec2 driver uses the Unix device and i-node as the -key. They were initialized when the file was opened. - -
- --static int -H5FD_sec2_cmp(const H5FD_t *_f1, const H5FD_t *_f2) -{ - const H5FD_sec2_t *f1 = (const H5FD_sec2_t*)_f1; - const H5FD_sec2_t *f2 = (const H5FD_sec2_t*)_f2; - - if (f1->device < f2->device) return -1; - if (f1->device > f2->device) return 1; - - if (f1->inode < f2->inode) return -1; - if (f1->inode > f2->inode) return 1; - - return 0; -} -- - - -
-Some drivers may also need to store certain information in the file superblock -in order to be able to reliably open the file at a later date. This is done by -three functions: one to determine how much space will be necessary to store -the information in the superblock, one to encode the information, and one to -decode the information. These functions are optional, but if any one is -defined then the other two must also be defined. - -
--
-The sb_size
function returns the number of bytes necessary to encode
-information needed later if the file is reopened. The sb_encode
-function encodes information from the file into buffer buf
-allocated by the caller. It also writes an 8-character (plus null
-termination) into the name
argument, which should be a unique
-identification for the driver. The sb_decode
function looks at
-the name
-
-
- decodes -data from the buffer buf and updates the file argument with the new information, -advancing *p in the process. -
-The part of this which is somewhat tricky is that the file must be readable -before the superblock information is decoded. File access modes fall outside -the scope of the HDF5 file format, but they are placed inside the boot block -for convenience.(8) - -
--Example: To be written later. - -
- - --HDF5 does not assume that a file is a linear address space of bytes. Instead, -the library will call functions to allocate and free portions of the HDF5 -format address space, which in turn map onto functions in the file driver to -allocate and free portions of file address space. The library tells the file -driver how much format address space it wants to allocate and the driver -decides what format address to use and how that format address is mapped onto -the file address space. Usually the format address is chosen so that the file -address can be calculated in constant time for data I/O operations (which are -always specified by format addresses). - -
- - - --The HDF5 format allows an optional userblock to appear before the actual HDF5 -data in such a way that if the userblock is sucked out of the file and -everything remaining is shifted downward in the file address space, then the -file is still a valid HDF5 file. The userblock size can be zero or any -multiple of two greater than or equal to 512 and the file superblock begins -immediately after the userblock. - -
--HDF5 allocates space for the userblock and superblock by calling an -allocation function defined below, which must return a chunk of memory at -format address zero on the first call. - -
- - --The library makes many types of allocation requests: - -
-H5FD_MEM_SUPER
-H5FD_MEM_BTREE
-H5FD_MEM_DRAW
-H5FD_MEM_META
-H5FD_MEM_GROUP
-H5FD_MEM_GHEAP
-H5FD_MEM_LHEAP
-H5FD_MEM_OHDR
-
-When a chunk of memory is freed the library adds it to a free list and
-allocation requests are satisfied from the free list before requesting memory
-from the file driver. Each type of allocation request enumerated above has its
-own free list, but the file driver can specify that certain object types can
-share a free list. It does so by providing an array which maps a request type
-to a free list. If any value of the map is H5MF_DEFAULT
(zero) then the
-object's own free list is used. The special value H5MF_NOLIST
indicates
-that the library should not attempt to maintain a free list for that
-particular object type, instead calling the file driver each time an object of
-that type is freed.
-
-
-Mappings predefined in the `H5FDpublic.h' file are: -
H5FD_FLMAP_SINGLE
-H5FD_FLMAP_DICHOTOMY
-H5FD_FLMAP_DEFAULT
-
-Example: To make a map that manages object headers on one free list
-and everything else on another free list one might initialize the map with the
-following code: (the use of H5FD_MEM_SUPER
is arbitrary)
-
-
-H5FD_mem_t mt, map[H5FD_MEM_NTYPES]; - -for (mt=0; mt<H5FD_MEM_NTYPES; mt++) { - map[mt] = (H5FD_MEM_OHDR==mt) ? mt : H5FD_MEM_SUPER; -} -- -
-If an allocation request cannot be satisfied from the free list then one of -two things happen. If the driver defines an allocation callback then it is -used to allocate space; otherwise new memory is allocated from the end of the -format address space by incrementing the end-of-address marker. - -
--
-The file argument is the file from which space is to be allocated,
-type is the type of memory being requested (from the list above) without
-being mapped according to the freelist map and size is the number of
-bytes being requested. The library is allowed to allocate large chunks of
-storage and manage them in a layer above the file driver (although the current
-library doesn't do that). The allocation function should return a format
-address for the first byte allocated. The allocated region extends from that
-address for size bytes. If the request cannot be honored then the
-undefined address value is returned (HADDR_UNDEF
). The first call to
-this function for a file which has never had memory allocated must
-return a format address of zero or HADDR_UNDEF
since this is how the
-library allocates space for the userblock and/or superblock.
-
-Example: To be written later. - -
- - -
-When the library is finished using a certain region of the format address
-space it will return the space to the free list according to the type of
-memory being freed and the free list map described above. If the free list has
-been disabled for a particular memory usage type (according to the free list
-map) and the driver defines a free
callback then it will be
-invoked. The free
callback is also invoked for all entries on the free
-list when the file is closed.
-
-
-
-The file argument is the file for which space is being freed; type -is the type of object being freed (from the list above) without being mapped -according to the freelist map; addr is the first format address to free; -and size is the size in bytes of the region being freed. The region -being freed may refer to just part of the region originally allocated and/or -may cross allocation boundaries provided all regions being freed have the same -usage type. However, the library will never attempt to free regions which have -already been freed or which have never been allocated. -
-A driver may choose to not define the free
function, in which case
-format addresses will be leaked. This isn't normally a huge problem since the
-library contains a simple free list of its own and freeing parts of the format
-address space is not a common occurrence.
-
-
-Example: To be written later. - -
- - --Each file driver must have some mechanism for setting and querying the end of -address, or EOA, marker. The EOA marker is the first format address -after the last format address ever allocated. If the last part of the -allocated address range is freed then the driver may optionally decrease the -eoa marker. - -
--
-This function returns the current value of the EOA marker for the specified -file. -
-Example: The sec2 driver just returns the current eoa marker value -which is cached in the file structure: - -
- --static haddr_t -H5FD_sec2_get_eoa(H5FD_t *_file) -{ - H5FD_sec2_t *file = (H5FD_sec2_t*)_file; - return file->eoa; -} -- -
-The eoa marker is initially zero when a file is opened and the library may set
-it to some other value shortly after the file is opened (after the superblock
-is read and the saved eoa marker is determined) or when allocating additional
-memory in the absence of an alloc
callback (described above).
-
-
-Example: The sec2 driver simply caches the eoa marker in the file -structure and does not extend the underlying Unix file. When the file is -flushed or closed then the Unix file size is extended to match the eoa marker. - -
- --static herr_t -H5FD_sec2_set_eoa(H5FD_t *_file, haddr_t addr) -{ - H5FD_sec2_t *file = (H5FD_sec2_t*)_file; - file->eoa = addr; - return 0; -} -- - - -
-These functions operate on data, transferring a region of the format address -space between memory and files. - -
- - - --A driver must specify two functions to transfer data from the library to the -file and vice versa. - -
--
-The read
function reads data from file file beginning at address
-addr and continuing for size bytes into the buffer buf
-supplied by the caller. The write
function transfers data in the
-opposite direction. Both functions take a data transfer property list
-dxpl which indicates the fine points of how the data is to be
-transferred and which comes directly from the H5Dread
or
-H5Dwrite
function. Both functions receive type of
-data being written, which may allow a driver to tune it's behavior for
-different kinds of data.
-
-Both functions should return a negative value if they fail to transfer the -requested data, or non-negative if they succeed. The library will never -attempt to read from unallocated regions of the format address space. - -
-
-Example: The sec2 driver just makes system calls. It tries not to
-call lseek
if the current operation is the same as the previous
-operation and the file position is correct. It also fills the output buffer
-with zeros when reading between the current EOF and EOA markers and restarts
-system calls which were interrupted.
-
-
-static herr_t -H5FD_sec2_read(H5FD_t *_file, H5FD_mem_t type/*unused*/, hid_t dxpl_id/*unused*/, - haddr_t addr, hsize_t size, void *buf/*out*/) -{ - H5FD_sec2_t *file = (H5FD_sec2_t*)_file; - ssize_t nbytes; - - assert(file && file->pub.cls); - assert(buf); - - /* Check for overflow conditions */ - if (REGION_OVERFLOW(addr, size)) return -1; - if (addr+size>file->eoa) return -1; - - /* Seek to the correct location */ - if ((addr!=file->pos || OP_READ!=file->op) && - file_seek(file->fd, (file_offset_t)addr, SEEK_SET)<0) { - file->pos = HADDR_UNDEF; - file->op = OP_UNKNOWN; - return -1; - } - - /* - * Read data, being careful of interrupted system calls, partial results, - * and the end of the file. - */ - while (size>0) { - do nbytes = read(file->fd, buf, size); - while (-1==nbytes && EINTR==errno); - if (-1==nbytes) { - /* error */ - file->pos = HADDR_UNDEF; - file->op = OP_UNKNOWN; - return -1; - } - if (0==nbytes) { - /* end of file but not end of format address space */ - memset(buf, 0, size); - size = 0; - } - assert(nbytes>=0); - assert((hsize_t)nbytes<=size); - size -= (hsize_t)nbytes; - addr += (haddr_t)nbytes; - buf = (char*)buf + nbytes; - } - - /* Update current position */ - file->pos = addr; - file->op = OP_READ; - return 0; -} -- -
-Example: The sec2 write
callback is similar except it updates
-the file EOF marker when extending the file.
-
-
-Some drivers may desire to cache data in memory in order to make larger I/O
-requests to the underlying file and thus improving bandwidth. Such drivers
-should register a cache flushing function so that the library can insure that
-data has been flushed out of the drivers in response to the application
-calling H5Fflush
.
-
-
-
-Flush all data for file file to storage. -
-Example: The sec2 driver doesn't cache any data but it also doesn't -extend the Unix file as aggressively as it should. Therefore, when finalizing a -file it should write a zero to the last byte of the allocated region so that -when reopening the file later the EOF marker will be at least as large as the -EOA marker saved in the superblock (otherwise HDF5 will refuse to open the -file, claiming that the data appears to be truncated). - -
- --static herr_t -H5FD_sec2_flush(H5FD_t *_file) -{ - H5FD_sec2_t *file = (H5FD_sec2_t*)_file; - - if (file->eoa>file->eof) { - if (-1==file_seek(file->fd, file->eoa-1, SEEK_SET)) return -1; - if (write(file->fd, "", 1)!=1) return -1; - file->eof = file->eoa; - file->pos = file->eoa; - file->op = OP_WRITE; - } - - return 0; -} -- - - -
-The library is capable of performing several generic optimizations on I/O, but -these types of optimizations may not be appropriate for a given VFL driver. -
- --Each driver may provide a query function to allow the library to query whether -to enable these optimizations. If a driver lacks a query function, the library -will disable all types of optimizations which can be queried. -
- --
-This function is called by the library to query which optimizations to enable -for I/O to this driver. These are the flags which are currently defined: - -
-Before a driver can be used the HDF5 library needs to be told of its -existence. This is done by registering the driver, which results in a driver -identification number. Instead of passing many arguments to the registration -function, the driver information is entered into a structure and the address -of the structure is passed to the registration function where it is -copied. This allows the HDF5 API to be extended while providing backward -compatibility at the source level. - -
--
-The driver described by struct cls is registered with the library and an -ID number for the driver is returned. -
-The H5FD_class_t
type is a struct with the following fields:
-
-
const char *name
-size_t fapl_size
-void *(*fapl_copy)(const void *fapl)
-fm_size
when both are defined.
-void (*fapl_free)(void *fapl)
-free
function to free the
-structure.
-size_t dxpl_size
-void *(*dxpl_copy)(const void *dxpl)
-xm_size
when both are
-defined.
-void (*dxpl_free)(void *dxpl)
-free
function to
-free the structure.
-H5FD_t *(*open)(const char *name, unsigned flags, hid_t fapl, haddr_t maxaddr)
-herr_t (*close)(H5FD_t *file)
-int (*cmp)(const H5FD_t *f1, const H5FD_t *f2)
-int (*query)(const H5FD_t *f, unsigned long *flags)
-haddr_t (*alloc)(H5FD_t *file, H5FD_mem_t type, hsize_t size)
-herr_t (*free)(H5FD_t *file, H5FD_mem_t type, haddr_t addr, hsize_t size)
-haddr_t (*get_eoa)(H5FD_t *file)
-herr_t (*set_eoa)(H5FD_t *file, haddr_t)
-haddr_t (*get_eof)(H5FD_t *file)
-herr_t (*read)(H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, void *buffer)
-herr_t (*write)(H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, const void *buffer)
-herr_t (*flush)(H5FD_t *file)
-H5FD_mem_t fl_map[H5FD_MEM_NTYPES]
--Example: The sec2 driver would be registered as: - -
- --static const H5FD_class_t H5FD_sec2_g = { - "sec2", /*name */ - MAXADDR, /*maxaddr */ - NULL, /*sb_size */ - NULL, /*sb_encode */ - NULL, /*sb_decode */ - 0, /*fapl_size */ - NULL, /*fapl_get */ - NULL, /*fapl_copy */ - NULL, /*fapl_free */ - 0, /*dxpl_size */ - NULL, /*dxpl_copy */ - NULL, /*dxpl_free */ - H5FD_sec2_open, /*open */ - H5FD_sec2_close, /*close */ - H5FD_sec2_cmp, /*cmp */ - H5FD_sec2_query, /*query */ - NULL, /*alloc */ - NULL, /*free */ - H5FD_sec2_get_eoa, /*get_eoa */ - H5FD_sec2_set_eoa, /*set_eoa */ - H5FD_sec2_get_eof, /*get_eof */ - H5FD_sec2_read, /*read */ - H5FD_sec2_write, /*write */ - H5FD_sec2_flush, /*flush */ - H5FD_FLMAP_SINGLE, /*fl_map */ -}; - -hid_t -H5FD_sec2_init(void) -{ - if (!H5FD_SEC2_g) { - H5FD_SEC2_g = H5FDregister(&H5FD_sec2_g); - } - return H5FD_SEC2_g; -} -- -
-A driver can be removed from the library by unregistering it - -
--
-Unregistering a driver makes it unusable for creating new file access or data -transfer property lists but doesn't affect any property lists or files that -already use that driver. - -
- - - - -If a C routine that takes a function pointer as an argument is -called from within C++ code, the C routine should be returned from -normally.
- -Examples of this kind of routine include callbacks such as
-H5Pset_elink_cb
and H5Pset_type_conv_cb
-and functions such as H5Tconvert
and
-H5Ewalk2
.
Exiting the routine in its normal fashion allows the HDF5 C -Library to clean up its work properly. In other words, if the C++ -application jumps out of the routine back to the C++ -“catch” statement, the library is not given the -opportunity to close any temporary data structures that were set -up when the routine was called. The C++ application should save -some state as the routine is started so that any problem that -occurs might be diagnosed.
- - - - - - - --
-This function is intended to be used by driver functions, not applications.
-It returns a pointer directly into the file access property list
-fapl
which is a copy of the driver's file access mode originally
-provided to the H5Pset_driver
function. If its argument is a data
-transfer property list fxpl
then it returns a pointer to the
-driver-specific data transfer information instead.
-
-The various private H5F_low_*
functions will be replaced by public
-H5FD*
functions so they can be called from drivers.
-
-
-All private functions H5F_addr_*
which operate on addresses will be
-renamed as public functions by removing the first underscore so they can be
-called by drivers.
-
-
-The haddr_t
address data type will be passed by value throughout the
-library. The original intent was that this type would eventually be a union of
-file address types for the various drivers and may become quite large, but
-that was back when drivers were part of HDF5. It will become an alias for an
-unsigned integer type (32 or 64 bits depending on how the library was
-configured).
-
-
-The various H5F*.c
driver files will be renamed H5FD*.c
and each
-will have a corresponding header file. All driver functions except the
-initializer and API will be declared static.
-
-
-This documentation didn't cover optimization functions which would be useful -to drivers like MPI-IO. Some drivers may be able to perform data pipeline -operations more efficiently than HDF5 and need to be given a chance to -override those parts of the pipeline. The pipeline would be designed to call -various H5FD optimization functions at various points which return one of -three values: the operation is not implemented by the driver, the operation is -implemented but failed in a non-recoverable manner, the operation is -implemented and succeeded. - -
--Various parts of HDF5 check the only the top-level file driver and do -something special if it is the MPI-IO driver. However, we might want to be -able to put the MPI-IO driver under other drivers such as the raw part of a -split driver or under a debug driver whose sole purpose is to accumulate -statistics as it passes all requests through to the MPI-IO driver. Therefore -we will probably need a function which takes a format address and or object -type and returns the driver which would have been used at the lowest level to -process the request. - -
- --
The driver name is by convention and might -not apply to drivers which are not distributed with HDF5. -
The access method also indicates how to translate -the storage name to a storage server such as a file, network protocol, or -memory. -
The term -"file access property list" is a misnomer since storage isn't -required to be a file. -
This -function is overloaded to operate on data transfer property lists also, as -described below. -
Read-only access is only appropriate when opening an existing -file. -
For instance, writing data to one handle will cause -the data to be immediately visible on the other handle. -
The ordering is -arbitrary as long as it's consistent within a particular file driver. -
File access modes do not describe data, but rather -describe how the HDF5 format address space is mapped to the underlying -file(s). Thus, in general the mapping must be known before the file superblock -can be read. However, the user usually knows enough about the mapping for the -superblock to be readable and once the superblock is read the library can fill -in the missing parts of the mapping. -
- - - - -