Support writing to FITS #25

sjperkins · 2024-03-27T08:37:46Z

As dask processes datasets in chunks, It would be useful to support writing to FITS files in chunks. The alternative is aggregating all chunks into a single large array before writing to disk, which is untenable with the data sizes that we currently encounter.

In the reading case, each chunk is handled as follows:

xarray-fits/xarrayfits/fits.py

Lines 66 to 72 in 3360d94

    
           def _get_data_function(fits_proxy, h, i, dt): 
        
               if fits_proxy.is_memory_mapped: 
        
                   data = fits_proxy.hdu_list[h].data[i] 
        
               else: 
        
                   data = fits_proxy.hdu_list[h].section[i] 
        
               return data.astype(dt.newbyteorder("="))

Either the data or section attributes are accessed, depending whether the file is memory mapped on a local filesystem, or remotely accessed. Presumably these attributes can be used to support chunk writes.

One concern I have is whether writing from multiple threads/processes will be handled properly. This is probably OK in the remote case, and I expect that the OS will handle paging the writes between memory and disk in the memory mapped case.

/cc @bennahugo @o-smirnov

The text was updated successfully, but these errors were encountered:

sjperkins added the enhancement New feature or request label Mar 27, 2024

sjperkins mentioned this issue Apr 3, 2024

compare to kerchunk? #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support writing to FITS #25

Support writing to FITS #25

sjperkins commented Mar 27, 2024

Support writing to FITS #25

Support writing to FITS #25

Comments

sjperkins commented Mar 27, 2024