Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle more second subdivisions #18

Open
st-bender opened this issue Nov 29, 2022 · 9 comments
Open

Handle more second subdivisions #18

st-bender opened this issue Nov 29, 2022 · 9 comments

Comments

@st-bender
Copy link

Describe the bug

Some netcdf files produced by python's xarray have time units nanoseconds since <datetime>.
Those are not yet handled by NCDataset via CFTime (JuliaGeo/NCDatasets.jl#192).
Supporting additional second subdivisions in timedecode() would be appreciated, such as microsecond and nanosecond. This would also be consistent with python's handling, see JuliaGeo/NCDatasets.jl#181 (comment).

To Reproduce

julia> using CFTime

julia> timedecode(1e9, "nanoseconds since 2000-01-01 00:00:00.001", "proleptic_gregorian")
ERROR: unknown units nanoseconds
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] timeunits(#unused#::Type{DateTimeProlepticGregorian}, units::String)
   @ CFTime ~/.julia/packages/CFTime/n09Um/src/CFTime.jl:446
 [3] timedecode(#unused#::Type{DateTimeProlepticGregorian}, data::Float64, units::String)
   @ CFTime ~/.julia/packages/CFTime/n09Um/src/CFTime.jl:494
 [4] timedecode(data::Float64, units::String, calendar::String; prefer_datetime::Bool)
   @ CFTime ~/.julia/packages/CFTime/n09Um/src/CFTime.jl:545
 [5] timedecode(data::Float64, units::String, calendar::String)
   @ CFTime ~/.julia/packages/CFTime/n09Um/src/CFTime.jl:543
 [6] top-level scope
   @ REPL[6]:1

Expected behavior

Same result as with seconds:

julia> timedecode(1, "seconds since 2000-01-01 00:00:00.001", "proleptic_gregorian")
2000-01-01T00:00:01.001

Environment

  • operating system: Ubunutu 18.04
  • Julia version: 1.6.7 and 1.8.3
  • CFTime: 0.1.2 and 0.1.3 (master)
@Alexander-Barth
Copy link
Member

So far, the smallest time units is milliseconds as internally that is the precision of the time stamp. We follow the approach of the Dates module in Julia.

I am wondering if in your use-case, sub-millisecond precision is important?

As far as I can see, we could:

  1. simply round the time information to the closest millisecond
  2. implement different internal time units as it is done in numpy.
  3. integrate with different packages able to handle sub-millisecond precisions (like https://github.com/JeffreySarnoff/TimesDates.jl)
  4. declare it out-of-scope 😞

@st-bender
Copy link
Author

Hi,

Thanks for your thoughts.

So far, the smallest time units is milliseconds as internally that is the precision of the time stamp. We follow the approach of the Dates module in Julia.

I am wondering if in your use-case, sub-millisecond precision is important?

At least for my use case it is not important, but it may be important for others. Netcdf files are produced that way by at least one package and maybe other tools. It would just be nice to be able to read them into Julia without too much fiddling. NCDatasets.jl (and Rasters.jl using the former) are the two I found that are at least starting points for having something similar to xarray. Both currently depend on CFTime.

As far as I can see, we could:

1. simply round the time information to the closest millisecond

Fine for me, and probably the easiest way until someone complains about missing time resolution. 🤷

2. implement different internal time units as it is done in [numpy](https://numpy.org/doc/stable/reference/arrays.datetime.html).

I guess that may be where it comes from, quite often the times in numpy are listed as datetime[ns].

3. integrate with different packages able to handle sub-millisecond precisions (like https://github.com/JeffreySarnoff/TimesDates.jl)

Python's xarray also has two options, datetime and cftime. But that depends on how much time and resources you want to spent on implementing the solution.

4. declare it out-of-scope disappointed

That would mean:

  1. Leave all the preprocessing steps to python/xarray, separately or via PyCall.

@st-bender
Copy link
Author

Hi there,
It's been a while, any more ideas on it?
Meanwhile, I found out that numpy uses internally a 64-bit integer representation, hence datetime64[ns], and counting starts on 1970-01-01. This leaves about +/- 290 years that can be addressed that way. Using 64-bit integer milliseconds like Dates then gives about +/- 290_000_000 years.

Anyway, regardless of the internal representation, it would be nice if CFTime could support all SI prefixes. The CF convention states:

The acceptable units for time are listed in the udunits.dat file. The most commonly used of these strings (and their abbreviations) includes day (d), hour (hr, h), minute (min) and second (sec, s).

The prefixes can be found in https://docs.unidata.ucar.edu/udunits/current/udunits2-prefixes.xml. The easiest way would be to round to the nearest millisecond then.

@Alexander-Barth
Copy link
Member

Here are some experiments that I did a while ago:
https://github.com/JuliaGeo/CFTime.jl/blob/flexible-resolution/test/flexible_resolution.jl

The idea is to just wrap the time as-is in a struct and use units and time origin as type parameters. Time units and origin would be flexible and we can use the type specialization of the julia compiler.

@Alexander-Barth
Copy link
Member

Alexander-Barth commented Jan 14, 2025

The current master is a big internal change relative to the previous version (see https://juliageo.org/CFTime.jl/latest/#Internal-API)
In particular, the CFTime types have how additional type parameters.

I tried to keep the API compatible.
Has somebody the time to test the current master version before I make a new release (as version 0.1.5 or as 0.2.0) ?

In my tests, the performance is also ok (in particular when comparing to python as we have the flexibility python's cftime with the performance of numpy).

Module median minimum mean std. dev.
julia-CFTime 0.00958 0.00471 0.01377 0.02038
julia-Dates 0.00989 0.00508 0.01332 0.01674
python-cftime 4.09601 3.96883 4.13992 0.12367
python-numpy 0.01230 0.01196 0.01232 0.00017

@rafaqz @felixcremer @meggart @simone-silvestri @visr

@rafaqz
Copy link
Member

rafaqz commented Jan 14, 2025

Ahh ok, best not to push it through quickly then. I've started putting breaking changes in another breaking branch until the release so we don't hit this situation.

I can just comment out some tests in Rasters for now to take the pressure off, but we could also release the patch on a branch off the last released version

@Alexander-Barth
Copy link
Member

Alexander-Barth commented Jan 14, 2025

OK, I made a separate branch from 0.1.3 with the zero function (zero_fun no pun intended :-))

Version 0.1.4:
JuliaRegistries/General#122969

@rafaqz
Copy link
Member

rafaqz commented Jan 14, 2025

But the benchmarks here look pretty good to me. At JuliaEO you mentioned some type instability?

@Alexander-Barth
Copy link
Member

Yes, if previously you had:

struct MyStuct
  dt::DateTimeStandard
end

you must use now:

struct MyStuct{T1,T2}
  dt::DateTimeStandard{T1,T2}
end

to make the field dt and the struct fully specified. However, the previous code would still run but probably slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants