Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add raster support #6

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Add raster support #6

wants to merge 10 commits into from

Conversation

asinghvi17
Copy link
Member

@asinghvi17 asinghvi17 commented Mar 27, 2023

This is a basic implementation of RasterDataSources.jl for NaturalEarth. With this, the code:

using NaturalEarth, Rasters, CairoMakie
# All NaturalEarth rasters are tiffs, 
# so we have to convert them manually to matrices of colors
# This might be better solved in Rasters.jl!
tiff = Raster(NaturalEarthRaster{:latest, 10, :HYP_HR}, 1)
tiff_image = Makie.RGBf.(eachslice(tiff, dims=3)...) ./ 255
image(tiff_image)

just works!
download-3

@asinghvi17 asinghvi17 requested review from rafaqz and haakon-e March 27, 2023 04:08
Copy link
Collaborator

@haakon-e haakon-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you put in a really great effort on this PR! Thanks for all the time you put into this.

I haven't had the chance to run the code yet, but I'll try to do so one of the next few days.

As a general comment; I'm not sure what kind of tests, if any, would be sensible to write for the code. Perhaps we could put an effort into some documentation in a future PR which downloads and plots some example rasters (and geojson files) to showcase usage. If we set strict==true when building docs, that would hopefully catch most issues, at least every time tests are run.
In that same future PR, we could add some docstrings as well e.g. to the NaturalEarthRaster struct and ne_raster method.

Comment on lines +29 to +41
function naturalearth(dataset_name::String)
if dataset_name ∈ available_artifacts
file_path = @artifact_str("$dataset_name/$dataset_name.geojson")
return GeoJSON.read(read(file_path, String))
# elseif dataset_name ∈ available_rasters
end

@assert isfile(file_path) """
`$dataset_name` is not a valid NaturalEarth.jl artifact!
Please search https://www.naturalearthdata.com for available
datasets.
"""
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this rewrite related to support for rasters, or just experimentation? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly to add support for Rasters in the future...wanted to have a template to build off.

# Implement the RasterDataSources.jl interface.
# With this, you can call e.g. # Rasters.Raster(NaturalEarthRaster{:latest, 10, :HYP_HR}, nothing)

struct NaturalEarthRaster{Version, Scale, Name} <: RasterDataSources.RasterDataSource end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is having the version, scale and name as type parameters a stylistic choice, performance gain (vs field names), or something else? .. I'm just curious!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way RasterDataSources works is that you have to pass a type in to RasterDataSources.getraster, and Rasters.jl has the same dispatch (Raster(T::Type{<: RasterDataSources.RasterDataSource}, layer; kwargs...)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wish we could keep them as fields, since that would allow a lot more validation, but oh well 🤷

Copy link
Member

@rafaqz rafaqz Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put less in the type parameter!

RasterDataSources.jl has the layer name as the second argument, and scale would be a keyword.

Otherwise you get compilation every time you use it. Having the name as second argument means a tuple of names can make a stack.

You can put as much of that as you want in the keywords really.

You just need at least one type for dispatch.

Copy link
Member

@rafaqz rafaqz Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, when there are multiple type parameters in RasterDataSources its usually because the data source is structurally different - they have different keyword arguments and separate dataset handling code. WorldClim{Weather} and WorldClim{Climate} dispatch to totally separate methods in different files.

Here you don't actually use them to dispatch different code, suggesting they probably should be keywords.

There are also a few in RasterDataSources.jl like ALWB that were just design mistakes and have unnecessary things in the type that could be keywords.

@haakon-e
Copy link
Collaborator

haakon-e commented Mar 27, 2023

I'm also a little confused at why the tests fail attempting to download the ne_50m_coastline from https://raw.githubusercontent.com/nvkelso/natural-earth-vector/v5.1.2/geojson/ne_50m_coastline.geojson
but I'd think that's unrelated to this PR. That file is one of the ones we didn't mark as lazy, so it should be installed by default when building the package.

@asinghvi17
Copy link
Member Author

Yeah, I'm kind of confused as well...will have to check that out locally.

return string(Name) * ".tif"
end

function RasterDataSources.getraster(T::Type{NaturalEarthRaster{Version, Scale, Name}}, layer = nothing) where {Version, Scale, Name}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't layer be Name ? that's how the rest of RasterDataSources.jl works unless I misunderstand something.

A NTuple{N,Symbol} for layer will then give you a RasterStack automatically, or just a NamedTuple returned from getraster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ne = NaturalEarth{10 #= scale =#}
Raster(ne, :HY_SR)

do you mean something like this, where each type represents a scale, and has its own set of layers we can pull from? That makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just have a scale keyword so there is no recompilation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raster(NaturalEarth, :HY_SR; scale=10)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so turns out that wont work! Because Rasters.jl has a fixed list of keywords to pass to getraster which does not include scale.

That's not going to "scale". I think we need a RasterDataSource to declare its keywords to make this more flexible.

Like:

getraster_keywords(NaturalEarth) = (:scale, :version) 

Then Rasters.jl can just pass those through and keep the rest.

Copy link
Member Author

@asinghvi17 asinghvi17 Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd also like to discuss the getraster interface more generally...the reliance on types kind of confused me.

Would love to chat about it at MakieCon :D

Copy link
Member

@rafaqz rafaqz Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With no types at all we have no dispatch. But do you mean types rather than constructed objects?

Partly its just historical, I didnt start the package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rewrite it all to use structs with fields.

Basically I organised and standardised the few initial sources as much as possible, added a few more, and have had very little time to put into it since.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


_zipfile_to_read(raster_name, zf) = first(filter(f -> splitdir(f.name)[2] == raster_name, zf.files))

function RasterDataSources.zippath(::Type{NaturalEarthRaster{Version, Scale, Name}}) where {Version, Scale, Name}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods will recompile with any change at all, probably not what you want.

I would guess Version will hardly ever lead to recompilation as people would stick to one version. But then you have the problem of the default, which is easier as a keyword.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't pay too much attention to keyword options, so will definitely look at that!


struct NaturalEarthRaster{Version, Scale, Name} <: RasterDataSources.RasterDataSource end

function ne_raster(; version = :latest, scale::Int = 10, name::Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you instead pass these keywords through and not put them in the type, except maybe at most version? As that's more tied to the definition of the dataset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure. This probably ties in with the question of which dataset to get anyway.

raster_path = RasterDataSources.rasterpath(T)
if !isfile(raster_path)
zip_path = RasterDataSources.zippath(T)
RasterDataSources._maybe_download(RasterDataSources.zipurl(T), zip_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending methods with underscores is risky, that's marking them as internal so I can change them in a patch version.

Maybe we should may maybe_download part of the API instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice, but as a utility function I could probably replicate it (I think it's just isfile(zip_path) ? zip_path : download(zipurl, zip_path)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/NaturalEarth.jl Outdated Show resolved Hide resolved
@asinghvi17
Copy link
Member Author

Thanks for the detailed review @rafaqz! I'll definitely look into the layer/kwarg approach.

Co-authored-by: Rafael Schouten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants