Replies: 10 comments 2 replies
-
Since you are already using NCO, I assume that does not solve your problem. |
Beta Was this translation helpful? Give feedback.
-
There is also a python package that wraps HDF5. You might see what it can do. |
Beta Was this translation helpful? Give feedback.
-
The original workflow is using the python package. I'm exploring other options. I've opened to discussion at https://forum.hdfgroup.org/c/hdf5/8 |
Beta Was this translation helpful? Give feedback.
-
HDF5 responded at https://forum.hdfgroup.org/t/transfer-records-to-another-file-without-decompress-recompress/12960 "H5Dread_chunk() and H5Dwrite_chunk() allow raw chunk data to be accessed and written while bypassing some or all compression filters." |
Beta Was this translation helpful? Give feedback.
-
OK, if I understand correctly, the idea is to copy one dataset to another (in a different file or the same file or both?) without decompressing/recompressing. Sure, why not. The problem is, we try to keep the netcdf-c API small, and this would require an extra function. Just to game this out, what would that function look like? Are we going to always copy the whole variable/dataset? (And subsetting would require decompression, since I don't know where the data are within the compressed chunks). In that case:
Is that what we are talking about here? @czender if such a function existed, could NCO make good use of it? Would it be useful? Presumably NCO allows subsetting when copying, and this only would work when there is no subsetting, right? Another alternative would be to write a straight-up HDF5 program to do this. As we all know, netCDF-4 just writes regular HDF5 datasets, which can be opened are read with HDF5 as well as netCDF4. Also files created with HDF5 can be opened by netCDF-4. So perhaps that's the easiest path to a working implementation? Even if we agreed that the above function prototype was correct, there is still the step of convincing the netCDF developers that this is worthy of a new function in the API. It's not clear to me that it is worth it, so the case has to be made. |
Beta Was this translation helpful? Give feedback.
-
My motivation is helping NCEP improve their resource utilization by speeding up various workflows. One of those workflows spends ~3 hours on the task described in the first posting (copies the last 120 of 121 time records from one file to another). I can only suspect anyone using nccopy would appreciate the speedup too. What other data/information would make the case? |
Beta Was this translation helpful? Give feedback.
-
If this function However, @dkokron's specific use case above involves hyperslabs so the minimum required prototype to (potentially) eliminate compression/uncompression would be |
Beta Was this translation helpful? Give feedback.
-
I've attached an ncdump of one of the files that is the subject of my optimization efforts. |
Beta Was this translation helpful? Give feedback.
-
Some time ago, I wrote a program to show the chunking layout for HDF5 datasets. |
Beta Was this translation helpful? Give feedback.
-
I put together a prototype code (see attached. pardon the mess) for testing the performance benefit from using H5Dread_chunk/H5Dwrite_chunk. To transfer one variable (wspd in the ncdump output attached above) takes The H5Dread_chunk/H5Dwrite_chunk approach took 58s. We can't change the compression strategy with this approach. |
Beta Was this translation helpful? Give feedback.
-
We have a workflow that copies the last 120 of 121 records from one file to another. The data are compressed and chunked. Each record is a chunk. Profiling shows the vast majority of time is spent decompressing then recompressing the data. A faster way would avoid decompressing and recompressing in the first place. I can see needing to decompress the data if the user wants to get at the real values, but I just want copy from one file to another. I was thinking of a low level block copy. Something like 'dd' command would do. Is that possible with NetCDF?
I'm using nco-5.2.4 built using spack and running on a zen2 chip.
Example usage:
ncrcat -7 -d time,1,120 -L 4 file.in file.out
ncrcat -7 -d time,1,120 --cmp='shf|zst,4' file.in file.out
Beta Was this translation helpful? Give feedback.
All reactions