Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open returns a corrupted Raster #884

Open
tiemvanderdeure opened this issue Feb 1, 2025 · 10 comments
Open

open returns a corrupted Raster #884

tiemvanderdeure opened this issue Feb 1, 2025 · 10 comments
Labels
bug Something isn't working

Comments

@tiemvanderdeure
Copy link
Collaborator

In some cases if I create a Raster with Rasters.create, and then open it, the returned Raster returns a NetCDF error on show (or pretty much any other operation).

using Rasters, RasterDataSources, ArchGDAL, NCDatasets
ras = Raster(WorldClim{Climate}, :tmin, month = 1)
ras2 = Rasters.create("myraster.nc", ras, force = true)
r = open(ras2; write =true) do dest
    dest .= 1
end
size(r)

This is pretty hard to reproduce for some reason and it depends on ras. Using ras = Raster(rand(X(1:10), Y(1:10))) it doesnt error
Stacktrace

ERROR: NetCDF error: NetCDF: Not a valid ID (NetCDF error code: -33)
Stacktrace:
 [1] check
   @ C:\Users\tsh371\.julia\packages\NCDatasets\LjXBn\src\errorhandling.jl:25 [inlined]
 [2] nc_inq_dimlen(ncid::Int32, dimid::Int32)
   @ NCDatasets C:\Users\tsh371\.julia\packages\NCDatasets\LjXBn\src\netcdf_c.jl:1329
 [3] #19
   @ C:\Users\tsh371\.julia\packages\NCDatasets\LjXBn\src\variable.jl:37 [inlined]
 [4] ntuple
   @ .\ntuple.jl:49 [inlined]
 [5] size
   @ C:\Users\tsh371\.julia\packages\NCDatasets\LjXBn\src\variable.jl:37 [inlined]
 [6] size
   @ C:\Users\tsh371\.julia\packages\Rasters\E2mWI\src\modifieddiskarray.jl:61 [inlined]
 [7] size(A::Raster{Union{…}, 2, Tuple{…}, Tuple{}, Rasters.ModifiedDiskArray{…}, Symbol, Metadata{…}, Missing})
   @ DimensionalData C:\Users\tsh371\.julia\packages\DimensionalData\5AYsf\src\array\array.jl:105 [8] top-level scope
@tiemvanderdeure tiemvanderdeure added the bug Something isn't working label Feb 1, 2025
@rafaqz
Copy link
Owner

rafaqz commented Feb 1, 2025

Hmm I guess it should return the original unopened object (its not corrupted, just closed)

The problem is you may want to return a value so it's a bit tricky.

cleanreturn is supposed to handle this, but... Probably we need to flatten the result and replace open arrays with the FileArray.

@tiemvanderdeure
Copy link
Collaborator Author

Still don't know exactly what is going on, but I figured out this doesn't happen when just lazily reading the same raster (only after Rasters.create) - but if I specify a missingval then for whatever reason it does!

using Rasters, RasterDataSources, ArchGDAL, NCDatasets
ras = Raster(WorldClim{Climate}, :tmin, month = 1, lazy = true)
ras2 = Rasters.create("myraster.nc", ras, force = true)
ras3 = Raster("myraster.nc"; lazy = true, missingval = missing)
ras4 = Raster("myraster.nc"; lazy = true)

for (i, r) in enumerate((ras, ras2, ras3, ras4))
    println("Opening ras$i")
    try 
        open(identity, r) |> size
    catch 
        println("Errored!")
    end
end
Opening ras1
Opening ras2
Errored!
Opening ras3
Errored!
Opening ras4

@rafaqz
Copy link
Owner

rafaqz commented Feb 3, 2025

I think the variability is whether it's wrapped in another array or not.

But do note that using the outputs of open has to be undefined behaviour unless we decide to flatten and reconstruct back to FileArray.

The whole point of the closure is that it closes the dataset at the end... And you are passing the dataset out (with identity) then operating on it.

@tiemvanderdeure
Copy link
Collaborator Author

Okay the second example with identity one would never actually use, but the original example actually is legit!

I think the main problem here is that you do a totally valid operation and then get a long stacktrace with a NetCDF error code: -33, where it isn't obvious at all that show errored but the actual operation worked fine. So maybe the solution would more be towards handling this in show, so that it doesn't error but just tell you the Raster is closed.

@tiemvanderdeure
Copy link
Collaborator Author

Hm the reason the last example doesn't error is that cleanreturn ends up calling Array on it - doesn't that kind of defeat the purpose of lazily loading it?

@rafaqz
Copy link
Owner

rafaqz commented Feb 4, 2025

I does! But..
You're not mean to return the whole array...

@tiemvanderdeure
Copy link
Collaborator Author

You're not mean to return the whole array...

Then we should at least document that.

If I don't know any of this (which untill yesterday I didn't, really) and I want to lazily multiply a file by 2, just from reading the docstring of open I would write something like

ras = Raster(filename; lazy = true)
open(ras; write =true) do dest
    dest .*= 2
end

And this works like a charm if ras is a .tif, but if it's a .nc I need to add return or else it gets loaded into memory (or the show errors).

I can see now why this is tricky to solve, but probably we can improve at least the documentation so that it's not as easy to go wrong - especially because that backend matters here.

@rafaqz
Copy link
Owner

rafaqz commented Feb 4, 2025

You don't need to use open to do anything lazily.

It shouldn't work with a tiff... I think it's not lazy?

The solution is a fairly easy with Flatten.jl, just need to do it

@tiemvanderdeure
Copy link
Collaborator Author

You don't need to use open to do anything lazily.

You do if you want to write to it? Or am I missing something obvious?

@rafaqz
Copy link
Owner

rafaqz commented Feb 4, 2025

But writing isn't lazy?

And again you shouldnt return the written values.

But I am convinced we should swap the arrays for FileArray if they are returned, I just need to finish the Zarr PR first as it changes a lot of this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants