-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_FillValue needed when saving undefined values #228
Comments
That's the intended behaviour. If you have a variable with missing values, you have to declare a fill value. The 1e37 was inserted by the netcdf c library because the corresponding elements where not initialized. Consider the case in Julia when you have an array with non- initialized elements. They are not getting transformed to missings either. |
That's not the case! using NCDatasets
tvals = collect(map(Float64, 0:3))
var = [3.0, missing, 9.1, 4.2]
NCDataset("tmp.nc", "c") do ds
defVar(ds, "time", tvals, ("time",))
defVar(ds, "var", var, ("time",))
end
But the netcdf C library doesn't recognize the CF convention. If the CF convention were part of the netcdf C library, it should set
What you are saying is: In your example, But, in the example I initially posted, I think |
In order to have type stable code, one must know at the call of In the example you copied:
However, when you compare these two examples: example 1. using NCDatasets
tvals = collect(map(Float64, 0:2))
var = tvals
NCDataset("tmp.nc", "c") do ds
defDim(ds,"time", Inf)
defVar(ds, "time", tvals, ("time",))
v = defVar(ds, "var", Float64, ("time",))
v[3] = var[3] # <- Skipping other time steps
end example 2. using NCDatasets
tvals = collect(map(Float64, 0:2))
var = tvals
NCDataset("tmp.nc", "c") do ds
defDim(ds,"time", Inf)
defVar(ds, "time", tvals, ("time",))
v = defVar(ds, "var", Float64, ("time",))
v[3] = var[3] # <- Skipping other time steps
v[1] = 1
v[2] = 1
end In example 1, clearly _FillValue must be set but not in example 2. The function defVar does not know whether the user will define all elements or not.
The only way I see to implement this is to keep track of all elements that have been written to and add the attribute Setting _FillValue unconditionally, would be quite annoying as future users of these files would need to handle arrays with Union{Float64,Missing} even if there is no missing value. |
Yes, what you say makes a lot of sense. I now see the problem better. But,
Would it really cause any practical problems? Who would be annoyed in what way?
I don't get this point. In most use cases, you just don't care whether it's To me, So, I would say, set This is all consistent with the netcdf C library: It puts undefined values as 1e37. So, the CF convention should say that all variable shall have I notice that in this issue tracker, there is a problem with calling an external Python function with |
Having an array of
Additional point to consider: the CF convention does not allow _FillValue for coordinate variables (https://cfconventions.org/cf-conventions/cf-conventions.html#attribute-appendix). The user is free rename a variable, so that it becomes a coordinate variable after the variable is defined in the NetCDF file.
|
I'm aware of that, but do you use a netCDF array in performance critical loops? I wouldn't. I would copy a section of the array from the netCDF file into the main memory for performance: v = copy(ds["myvar"][:, :, 3,:]) # v is Array{Union{Float64,Missing}}
# work on v[:,:,:] in the loop So, if the performance degradation of Missing is really problematic for you, you would change the above code to v = replace(ds["myvar"][:,:,3,:], missing=>NaN) # v is Array{Float64}
# work on v in the loop. I mean, if you need performance, you copy the data from the netCDF file into a native array anyway. So, if the overhead of Missing really matters, you just replace Or, do you really mean that a netCDF array is as fast as a native array and so people actually use netCDF arrays in performance critical loops without copying them into native arrays?
But you have to copy the data from the netCDF file to a native array anyway. I mean, you have to copy
I agree that that's a real problem. (You could scan the coordinate variables and delete _FillValue before saving it, but it's ugly, I admit.) |
Once you index a If you use only small arrays, then the need to copy would not matter to you. But for very large arrays (or systems with low memory), this can be an issue (I had past issue report about this). Concerning the copy, assuming that you copy an array from a netcdf file (without fill value), currently the flow is:
With a fill value set it is:
The netCDF C code can only deal with |
I am closing this issue because I think that the current behavior best serves the most users. |
Sorry for my silence for this issue. This message is just for discussion. I don't mean to re-open the issue.
What do you mean by "this"? If the entire array doesn't fit into the memory, you want to use the netCDF array without copying it. But in that case, you'll use the netCDF array without copying it into the memory. That's what you are saying. I that right? But you don't want to that in a tight loop because a lot of time will be then spend to read pieces of the big array from the file. So, it seems to me you are confusing these things:
In cases 1 & 2, you handle In case 3, the performance degradation due to So, that was my argument. Performance hit due to |
Describe the bug
In some cases,
NCDatasets
writes its own "missing" values without setting_FillValue
. A netCDF file created like that doesn't work correctly. (A workaround is to explicitly set_FillValue
when you create the file.)To Reproduce
Environment
versioninfo()
: See below.using Pkg; Pkg.status(mode=PKGMODE_MANIFEST)
: See belowFull output
versioninfo()
:The text was updated successfully, but these errors were encountered: