Folder to HDF5 Conversion

This guide will help you convert a folder to an HDF5 file using folder2hdf5.py and provide an example of how to use the resulting HDF5 file with a PyTorch dataset.

HDF5 is a highly optimized dataset file format that's optimized for HPC workload. It can greatly speed up your IO performance.

For the same dataset of about 70 GB on disk:

Before HDF5: Dataset is 70GB

After HDF5: Dataset is 17GB

`folder2hdf5.py`

Install required packages:
```
pip install h5py
```
Run the conversion script:
```
python folder2hdf5.py
```
1. Make sure to change the input folder and output hdf5 name accordingly.
2. You can play with the create_dataset chunks and compression parameters to achieve varying degree of effect. There are a lot of parameters to tune in general and the example here are just some standard stuff.
3. The example given here are converting a mesh dataset. Because one mesh is not made of one single numpy array (such as an image) but rather a combination of two or more numpy arrays as faces and vertices, one mesh will be converted to a HDF5 group, with two HDF5 datasets. One for faces and one for vertices.
  - So, if your dataset is an image dataset, then you can convert it directly to a HDF5 dataset, instead of a HDF5 group.
  - If you are not sure about the definition of HDF5 group and HDF5 dataset, please checkout HDF5's official tutorial.

PyTorch Dataset Sample

breakingbad_hdf5.py is a sample PyTorch dataset showcasing how to use the generated HDF5 file. Please checkout the _read_objs_from_h5 function.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
README.md		README.md
breakingbad_hdf5.py		breakingbad_hdf5.py
folder2hdf5.py		folder2hdf5.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Folder to HDF5 Conversion

`folder2hdf5.py`

PyTorch Dataset Sample

About

Releases

Packages

Languages

ai4ce/folder2hdf5

Folders and files

Latest commit

History

Repository files navigation

Folder to HDF5 Conversion

folder2hdf5.py

PyTorch Dataset Sample

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`folder2hdf5.py`

Packages