Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better way of writing data to disk? #31

Closed
ali-ramadhan opened this issue Feb 9, 2019 · 9 comments
Closed

Better way of writing data to disk? #31

ali-ramadhan opened this issue Feb 9, 2019 · 9 comments
Assignees
Labels
abstractions 🎨 Whatever that means feature 🌟 Something new and shiny
Milestone

Comments

@ali-ramadhan
Copy link
Member

ali-ramadhan commented Feb 9, 2019

Right now FieldWriter writes field.data using write(filepath, array) so you lose the size of the array (and the field type) when writing. This is probably inevitable here.

A better solution would be to write output as NetCDF.

@ali-ramadhan ali-ramadhan added the feature 🌟 Something new and shiny label Feb 9, 2019
@ali-ramadhan ali-ramadhan added this to the v0.5 milestone Feb 13, 2019
@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Feb 22, 2019

Decision was made: NetCDF first for now.

@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Feb 25, 2019

There are two packages providing high-level interfaces for reading/writing NetCDF files. I went with NetCDF.jl as it seemed older and more mature maybe, but would be good to keep track of NCDatasets.jl as it seems to use data frames instead of just arrays.

@ali-ramadhan
Copy link
Member Author

NetCDF.jl (so does NCDatasets.jl) has some heavy dependencies like Conda, CMake, and HDF5 which must be built. Don't think there's any way around this, NetCDF output is a must.

@ali-ramadhan
Copy link
Member Author

NetCDF.jl seems to be missing some features and isn't really being maintained (See JuliaGeo/NetCDF.jl#62 about saving time values and JuliaGeo/NetCDF.jl#39). Maybe it's worth switching to NCDatasets.jl which takes a more data frames approach to NetCDF and is actively maintained and grew out of bugs that weren't being fixed in NetCDF.jl. Unfortunately we're choosing between two relatively young packages. An alternative would be to use the much more mature netcdf4-python but I'd rather not have to use PyCall...

@ali-ramadhan ali-ramadhan reopened this Feb 28, 2019
@glwagner
Copy link
Member

We may have to contribute to the development of whatever NetCDF package we choose to use. Let’s pick the project we’d most like to contribute to.

The developer of NCDatasets is an oceanographer. That’s a plus.

@glwagner
Copy link
Member

By “dataframes” approach, do you mean the dictionary-like interface?

@ali-ramadhan
Copy link
Member Author

Yeah sorry I thought it used DataFrames.jl or something but yeah it just uses Dicts for the attributes.

Might be good to see what CliMA.jl is thinking of using so we don't end up building towards two different solutions.

@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Feb 28, 2019

Output needs to be arbitrary. We may need to perform on-line analysis and output the result (example: turbulent dissipation rate, time-averages, slices of fields, point values, etc).

We should design an additional interface for Fields. The type of the field indicates the coordinates on which the field is defined, so we should design an interface that uses that information.

Originally posted by @glwagner in https://github.com/ali-ramadhan/Oceananigans.jl/pull/93#issuecomment-468290310

Just adding your comment here as I think there are two new questions raised:

  1. How to integrate diagnostics with the output writing framework?
  2. Right now each NetCDF output file shows a single snapshot. Maybe it makes more sense to keep appending to an existing NetCDF file. This might also make addressing (1) easier especially if the diagnostics have a different output frequency that other fields.

@ali-ramadhan
Copy link
Member Author

Closing this as I feel like we resolved it. The original issue was that we were writing binary output and now we're writing NetCDF output.

I will open a new issue discussing our needs for better NetCDF output.

@ali-ramadhan ali-ramadhan self-assigned this Mar 21, 2019
ali-ramadhan added a commit that referenced this issue Oct 19, 2020
Dry rising thermal bubble verification experiment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abstractions 🎨 Whatever that means feature 🌟 Something new and shiny
Projects
None yet
Development

No branches or pull requests

2 participants