-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String-valued dimension incorrectly loaded as matrix of characters #237
Comments
What is the output of ncdump -h file.nc?
George Datseris ***@***.***> schrieb am Mi., 8. Nov. 2023,
15:06:
… *Describe the bug*
A colleague of mine that uses Python and xarray has sent me a .nc file.
One of the dimensions of the .nc file has string values (i.e., it is like a
list of names). When I try to load this file I get:
Dimensions
time = 4000
diagnostic = 19
ic = 101
string13 = 13
Variables
values (101 × 19 × 4000)
Datatype: Union{Missing, Float64} (Float64)
Dimensions: ic × diagnostic × time
Attributes:
_FillValue = NaN
time (4000)
Datatype: Union{Missing, Float64} (Float64)
Dimensions: time
Attributes:
_FillValue = NaN
ic (101)
Datatype: Int32 (Int32)
Dimensions: ic
diagnostic (13 × 19)
Datatype: Char (Char)
Dimensions: string13 × diagnostic
Attributes:
_Encoding = utf-8
and accessing the diagnostic variable gives:
julia> v = data["diagnostic"]; v[:]
13×19 Matrix{Char}:
's' 's' 's' 't' 't' … 's' 's' 'a' 'a' 'a'
'a' 'a' 'a' 'e' 'e' 'a' 'e' 'm' 'm' 'a'
'l' 'l' 'l' 'm' 'm' 'l' 'a' 'o' 'o' 'b'
't' 't' 't' 'p' 'p' 't' 'i' 'c' 'c' 'w'
'_' '_' '_' '_' '_' '_' 'c' '_' '_' '\0'
't' 's' 's' 's' 's' … 'f' 'e' 'm' 'E' '\0'
'o' 'u' 'u' 'u' 'u' 'o' '\0' 'a' 'Q' '\0'
't' 'b' 'b' 'b' 'b' 'r' '\0' 'x' '\0' '\0'
'\0' '_' '_' '_' '_' 'c' '\0' '\0' '\0' '\0'
'\0' 'N' 'S' 'N' 'S' '_' '\0' '\0' '\0' '\0'
'\0' 'A' 'A' 'A' 'A' … 't' '\0' '\0' '\0' '\0'
'\0' '\0' '\0' '\0' '\0' 'o' '\0' '\0' '\0' '\0'
'\0' '\0' '\0' '\0' '\0' 't' '\0' '\0' '\0' '\0'
*To Reproduce*
Please give me an email address I can give access to to the file, as it is
not possible to share the data publicly on GitHub. Once the file is
downloaded, to reproduce do simply:
data = NCDataset("filename.nc")
v = data["diagnostic"]
v[:]
*Expected behavior*
The dimension values for "diagnostic" should be a vector of strings
instead of a matrix of chars.
I admit, I do not know where the problem comes from. My colleague insists
that he saves the data "correctly" with xarray and once he loads the data
he gets the dimension as a vector of strings.
*Environment*
- operating system: Windows 10
- Julia version: 1.9.3
- NCDatasets version: ⌅ [85f8d34a] NCDatasets v0.12.17 (currently
checking if problem persists in new version 0.13)
—
Reply to this email directly, view it on GitHub
<#237>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACLMPA6CDNX5BLVU4F7YSMLYDOGYBAVCNFSM6AAAAAA7C6DUFGVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGNRZGU2DKNA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi, I do not know where I have to run this command, my shell does not have Meanwhile, my colleague has given me a way to reproduce the problem. In Python's xarray do: Ntime = 4000
Nobs = 19
N = 101
data = np.empty((Ntime, Nobs, N))
observables = ['salt_tot', 'salt_sub_NA', 'salt_sub_SA', 'temp_sub_NA', 'temp_sub_SA', 'sst_NA', 'sst_SA', 'sss_NA', 'sss_SA', 'rho_sub_NA', 'rho_sub_SA', 'rho_NA', 'rho_SA', 'salt_forc', 'salt_forc_tot', 'seaice', 'amoc_max', 'amoc_EQ', 'aabw']
time_vector = 5.*np.arange(Ntime)
initial_cond = np.arange(Nobs)
ds = xr.Dataset({'values': (['time', 'diagnostic', 'ic'], data)}, coords={'time': time_vector, 'diagnostic': observables, 'ic': initial_cond})
ds.to_netcdf('file.nc') |
Just stumbled upon this topic and checked the output on my machine. The python code has a typo. It should have
Anyways. Everything's looking good on OS X 12.6.6 and Linux Centos 7 using NCDatasets 0.12.17
ncdump gives correct data here as well (I lowered the numbers for time and ic dimensions):
Hopefully that helps. |
@Datseris If you need in future ncdump, here is some information for windows users: https://docs.unidata.ucar.edu/netcdf-c/current/winbin.html It helps me a lot when users provide this additional information as ncdump is independent of NCDatasets (and xarray) and gives the metadata in the NetCDF as it is stored. I know it can take some time to get these installed on windows, but ncdump is really valuable to troubleshoot issues with NetCDF files. With the shell tool @wobagi Thanks a lot for your input and correcting @Datseris example. I just installed xarray ( 2023.10.1) and I get :
So the data is indeed stored as a matrix of chars. Also the file is a NetCDF 3 file (NetCDF 3 does not support strings).
Now the data is a vector of strings (as in @wobagi case) and the format is in NetCDF4 (note the data type of I think that NCDatasets is correct to read a matrix as a matrix and a vector as a vector. (Maybe xarray should give the user a warning when NetCDF 4 features are "approximated" (when python-netCDF4 is not installed) as in this case. ) |
Thank you very much, you have proven concretely that this is not an issue with NCDatasets.jl. I will ask my colleague to update to NetCDF4. |
Describe the bug
A colleague of mine that uses Python and xarray has sent me a .nc file. One of the dimensions of the .nc file has string values (i.e., it is like a list of names). When I try to load this file I get:
and accessing the diagnostic variable gives:
each column here is a variable name. So each column should have been a string.
To Reproduce
Please give me an email address I can give access to to the file, as it is not possible to share the data publicly on GitHub. Once the file is downloaded, to reproduce do simply:
Expected behavior
The dimension values for "diagnostic" should be a vector of strings instead of a matrix of chars.
I admit, I do not know where the problem comes from. My colleague insists that he saves the data "correctly" with xarray and once he loads the data he gets the dimension as a vector of strings.
Environment
⌅ [85f8d34a] NCDatasets v0.12.17
(currently checking if problem persists in new version 0.13)The text was updated successfully, but these errors were encountered: