Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add raster support #6

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,18 @@ authors = ["Haakon Ludvig Langeland Ervik (@haakon-e), Anshul Singhvi <anshulsin
version = "0.1.0"

[deps]
Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
GeoJSON = "61d90e0f-e114-555e-ac52-39dfb47a3ef9"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
RasterDataSources = "3cb90ccd-e1b6-4867-9617-4276c8b2ca36"
Scratch = "6c6a2e73-6563-6170-7368-637461726353"
XML = "72c71f33-b9b6-44de-8c94-c961784809e2"
p7zip_jll = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0"

[compat]
GeoJSON = "0.6"
RasterDataSources = "0.5"
XML = "0.1.3"
julia = "1.3"

[extras]
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# NaturalEarth.jl

[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://asinghvi17.github.io/NaturalEarth.jl/stable/)
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://asinghvi17.github.io/NaturalEarth.jl/dev/)
[![Build Status](https://github.com/asinghvi17/NaturalEarth.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/asinghvi17/NaturalEarth.jl/actions/workflows/CI.yml?query=branch%3Amain)
[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://JuliaGeo.github.io/NaturalEarth.jl/stable/)
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://JuliaGeo.github.io/NaturalEarth.jl/dev/)
[![Build Status](https://github.com/JuliaGeo/NaturalEarth.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/JuliaGeo/NaturalEarth.jl/actions/workflows/CI.yml?query=branch%3Amain)


This package provides a Julia interface to the [Natural Earth](http://www.naturalearthdata.com/) dataset. The Natural Earth dataset is a public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. It is ideal for small-scale thematic mapping and for creating static maps and illustrations.

Currently, this package provides a single function, `naturalearth`, which fetches any `.geojson` file from [this](https://github.com/nvkelso/natural-earth-vector/tree/master/geojson) repository. The function returns a `GeoJSON.FeatureCollection` object.
Currently, this package provides a single function, `naturalearth(dataset_name::String)`, which fetches any `.geojson` file from [this](https://github.com/nvkelso/natural-earth-vector/tree/master/geojson) repository. The function returns a `GeoJSON.FeatureCollection` object, from the [`GeoJSON.jl`](httos://github.com/JuliaGeo/GeoJSON.jl) package.

The data is downloaded on demand and cached using Julia's `Artifacts` system. This means that the first time you fetch a dataset, it will take a while to download, and you will need an internet connection. Subsequent calls will be much faster, even in a new session.

Expand Down
6 changes: 3 additions & 3 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ DocMeta.setdocmeta!(NaturalEarth, :DocTestSetup, :(using NaturalEarth); recursiv
makedocs(;
modules=[NaturalEarth],
authors="Anshul Singhvi <[email protected]> and contributors",
repo="https://github.com/asinghvi17/NaturalEarth.jl/blob/{commit}{path}#{line}",
repo="https://github.com/JuliaGeo/NaturalEarth.jl/blob/{commit}{path}#{line}",
sitename="NaturalEarth.jl",
format=Documenter.HTML(;
prettyurls=get(ENV, "CI", "false") == "true",
canonical="https://asinghvi17.github.io/NaturalEarth.jl",
canonical="https://JuliaGeo.github.io/NaturalEarth.jl",
edit_link="main",
assets=String[],
),
Expand All @@ -20,6 +20,6 @@ makedocs(;
)

deploydocs(;
repo="github.com/asinghvi17/NaturalEarth.jl",
repo="github.com/JuliaGeo/NaturalEarth.jl",
devbranch="main",
)
221 changes: 213 additions & 8 deletions src/NaturalEarth.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,42 @@ module NaturalEarth
import GeoJSON
using Pkg
using Pkg.Artifacts
using p7zip_jll
using Scratch
using Downloads
using XML
using RasterDataSources

download_cache = ""

"A list of the names of available artifacts"
const available_artifacts = collect(keys(
Artifacts.select_downloadable_artifacts(Artifacts.find_artifacts_toml(@__FILE__); include_lazy=true)
))

export naturalearth, bathymetry
export naturalearth, bathymetry, NaturalEarthRaster, ne_raster

"""
naturalearth(name::String)
naturalearth(dataset_name::String)

Load a NaturalEarth dataset as a `GeoJSON.FeatureCollection` object.

Valid names are found in `Artifacts.toml`.
We aim to support all datasets listed in https://github.com/nvkelso/natural-earth-vector/tree/master/geojson
"""
function naturalearth(name::String)
pth = @artifact_str("$name/$name.geojson")
@assert isfile(pth) "`$name` is not a valid NaturalEarth.jl artifact"
GeoJSON.read(read(pth, String))
end
function naturalearth(dataset_name::String)
if dataset_name ∈ available_artifacts
file_path = @artifact_str("$dataset_name/$dataset_name.geojson")
return GeoJSON.read(read(file_path, String))
# elseif dataset_name ∈ available_rasters
end

@assert isfile(file_path) """
`$dataset_name` is not a valid NaturalEarth.jl artifact!
Please search https://www.naturalearthdata.com for available
datasets.
"""
end
Comment on lines +29 to +41
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this rewrite related to support for rasters, or just experimentation? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly to add support for Rasters in the future...wanted to have a template to build off.


"""
bathymetry(contour::Int = 2000)
Expand All @@ -50,5 +65,195 @@ function bathymetry(contour::Int=2000)
end


# Get rasters from AWS

# function get_available_rasters(refresh = false)
# scratchspace = Scratch.@get_scratch!("rasters")
# xml_file = joinpath(scratchspace, "index.xml")
# # refresh the XML file
# if !isfile(xml_file) || refresh
# download("https://naturalearth.s3.amazonaws.com/", xml_file)
# end
# # read the document
# xml_doc = LightXML.parse_file(xml_file);
# # get the root node
# xml_root = LightXML.root(xml_doc);
# @assert name(xml_root) == "ListBucketResult" "The format of the NaturalEarth data dictionary at Amazon AWS has changed, or the file is invalid. Please file an issue at https://github.com/JuliaGeo/NaturalEarth.jl/issues."
# return deepcopy(attributes_dict(xml_root))
# end

# function available_rasters_dict(keys)
# retval = Dict{String, Vector{VersionNumber}}()
# for key_element in keys
# key_str = key_element[1]
# endswith(key_str, "/") && continue # discard directory specs
# if haskey(key_str, "raster")
# name = splitext(basename(key_str))[1]
# dirs = splitpath(name)

# version = if length(dirs) == 1 || !(occursin(".", dirs[1])) # no version number
# VersionNumber(typemax(UInt32))
# else
# VersionNumber(dirs[1])
# end

# # for i in 1
# end
# end

# Implement the RasterDataSources.jl interface.
# With this, you can call e.g. # Rasters.Raster(NaturalEarthRaster{:latest, 10, :HYP_HR}, nothing)

struct NaturalEarthRaster{Version, Scale, Name} <: RasterDataSources.RasterDataSource end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is having the version, scale and name as type parameters a stylistic choice, performance gain (vs field names), or something else? .. I'm just curious!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way RasterDataSources works is that you have to pass a type in to RasterDataSources.getraster, and Rasters.jl has the same dispatch (Raster(T::Type{<: RasterDataSources.RasterDataSource}, layer; kwargs...)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wish we could keep them as fields, since that would allow a lot more validation, but oh well 🤷

Copy link
Member

@rafaqz rafaqz Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put less in the type parameter!

RasterDataSources.jl has the layer name as the second argument, and scale would be a keyword.

Otherwise you get compilation every time you use it. Having the name as second argument means a tuple of names can make a stack.

You can put as much of that as you want in the keywords really.

You just need at least one type for dispatch.

Copy link
Member

@rafaqz rafaqz Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, when there are multiple type parameters in RasterDataSources its usually because the data source is structurally different - they have different keyword arguments and separate dataset handling code. WorldClim{Weather} and WorldClim{Climate} dispatch to totally separate methods in different files.

Here you don't actually use them to dispatch different code, suggesting they probably should be keywords.

There are also a few in RasterDataSources.jl like ALWB that were just design mistakes and have unnecessary things in the type that could be keywords.


function ne_raster(; version = :latest, scale::Int = 10, name::Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you instead pass these keywords through and not put them in the type, except maybe at most version? As that's more tied to the definition of the dataset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure. This probably ties in with the question of which dataset to get anyway.

NaturalEarthRaster{Symbol(version), scale, Symbol(name)}
end

const _NATURALEARTH_URI = RasterDataSources.URIs.URI(scheme="https", host="naturalearth.s3.amazonaws.com", path="/")

function _zippath(::Type{NaturalEarthRaster{Version, Scale, Name}}) where {Version, Scale, Name}
return if Version == :latest || VersionNumber(string(Version)).major == typemax(UInt32)
"" # max version, so latest
else
string(Version) * "/"
end * if Scale == 110 || Scale == 50 || Scale == 10
string(Scale) * "m_raster"
else
error("Invalid scale: $Scale")
end * "/" * string(Name) * ".zip"
end

_zipfile_to_read(raster_name, zf) = first(filter(f -> splitdir(f.name)[2] == raster_name, zf.files))

function RasterDataSources.zippath(::Type{NaturalEarthRaster{Version, Scale, Name}}) where {Version, Scale, Name}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods will recompile with any change at all, probably not what you want.

I would guess Version will hardly ever lead to recompilation as people would stick to one version. But then you have the problem of the default, which is easier as a keyword.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't pay too much attention to keyword options, so will definitely look at that!

return joinpath(RasterDataSources.rasterpath(), "NaturalEarth", "zips", split(_zippath(NaturalEarthRaster{Version, Scale, Name}), "/")...)
end

function RasterDataSources.zipurl(::Type{NaturalEarthRaster{Version, Scale, Name}}) where {Version, Scale, Name}
return RasterDataSources.URIs.URI(_NATURALEARTH_URI; path = "/" * _zippath(NaturalEarthRaster{Version, Scale, Name}))
end

function RasterDataSources.zipname(::Type{NaturalEarthRaster{Version, Scale, Name}}) where {Version, Scale, Name}
return splitpath(_zippath(NaturalEarthRaster{Version, Scale, Name}))[end]
end

function RasterDataSources.rasterpath(T::Type{NaturalEarthRaster{Version, Scale, Name}}) where {Version, Scale, Name}
return joinpath(RasterDataSources.rasterpath(), "NaturalEarth", split((_zippath(NaturalEarthRaster{Version, Scale, Name})), "/")[1:end-1]..., RasterDataSources.rastername(T))
end

function RasterDataSources.rastername(::Type{NaturalEarthRaster{Version, Scale, Name}}) where {Version, Scale, Name}
return string(Name) * ".tif"
end

function RasterDataSources.getraster(T::Type{NaturalEarthRaster{Version, Scale, Name}}, layer = nothing) where {Version, Scale, Name}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't layer be Name ? that's how the rest of RasterDataSources.jl works unless I misunderstand something.

A NTuple{N,Symbol} for layer will then give you a RasterStack automatically, or just a NamedTuple returned from getraster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ne = NaturalEarth{10 #= scale =#}
Raster(ne, :HY_SR)

do you mean something like this, where each type represents a scale, and has its own set of layers we can pull from? That makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just have a scale keyword so there is no recompilation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raster(NaturalEarth, :HY_SR; scale=10)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so turns out that wont work! Because Rasters.jl has a fixed list of keywords to pass to getraster which does not include scale.

That's not going to "scale". I think we need a RasterDataSource to declare its keywords to make this more flexible.

Like:

getraster_keywords(NaturalEarth) = (:scale, :version) 

Then Rasters.jl can just pass those through and keep the rest.

Copy link
Member Author

@asinghvi17 asinghvi17 Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd also like to discuss the getraster interface more generally...the reliance on types kind of confused me.

Would love to chat about it at MakieCon :D

Copy link
Member

@rafaqz rafaqz Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With no types at all we have no dispatch. But do you mean types rather than constructed objects?

Partly its just historical, I didnt start the package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rewrite it all to use structs with fields.

Basically I organised and standardised the few initial sources as much as possible, added a few more, and have had very little time to put into it since.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raster_path = RasterDataSources.rasterpath(T)
if !isfile(raster_path)
zip_path = RasterDataSources.zippath(T)
RasterDataSources._maybe_download(RasterDataSources.zipurl(T), zip_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending methods with underscores is risky, that's marking them as internal so I can change them in a patch version.

Maybe we should may maybe_download part of the API instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice, but as a utility function I could probably replicate it (I think it's just isfile(zip_path) ? zip_path : download(zipurl, zip_path)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zf = RasterDataSources.ZipFile.Reader(zip_path)
mkpath(dirname(raster_path))
raster_name = RasterDataSources.rastername(T)
@show zf
write(raster_path, read(_zipfile_to_read(raster_name, zf)))
close(zf)
end
return raster_path
end

function __init__()
global download_cache = Scratch.get_scratch!(@__MODULE__, "rasters")
end

# Raster API - hook into RasterDataSources

function check_scale(scale::Int)
return scale in (110, 50, 10)
end

function ne_file_name(scale::Int, type::String, category::String)
if type in (
"countries",
"map_units",
"map_subunits",
"sovereignty",
"tiny_countries",
"boundary_lines_land",
"pacific_groupings",
"breakaway_disputed_areas",
"boundary_lines_disputed_areas",
"boundary_lines_maritime_indicator")
type = "admin_0_" * type
elseif type == "states"
type = "admin_1_" * type
end


if category == "raster"
return type
else
return "ne_$(scale)m_$(type)"
end
end

end # module

# This is the name checker implementation for the NaturalEarth R package.

# function(scale = 110,
# type = "countries",
# category = c("cultural", "physical", "raster"),
# full_url = FALSE) {
# # check on permitted scales, convert names to numeric
# scale <- check_scale(scale)

# # check permitted category
# category <- match_arg(category)


# # add admin_0 to known types
# if (type %in% c(
# "countries",
# "map_units",
# "map_subunits",
# "sovereignty",
# "tiny_countries",
# "boundary_lines_land",
# "pacific_groupings",
# "breakaway_disputed_areas",
# "boundary_lines_disputed_areas",
# "boundary_lines_maritime_indicator"
# )) {
# type <- paste0("admin_0_", type)
# }


# # add admin_1 to known types
# # this actually just expands 'states' to the name including lakes
# if (type == "states") {
# type <- "admin_1_states_provinces_lakes"
# }


# if (category == "raster") {
# # raster seems not to have so straightforward naming, so require that name
# # is passed in type
# file_name <- paste0(type)
# } else {
# file_name <- paste0("ne_", scale, "m_", type)
# }


# # https://naturalearth.s3.amazonaws.com/110m_cultural/ne_110m_admin_0_countries.zip
# if (full_url) {
# file_name <- paste0(
# "https://naturalearth.s3.amazonaws.com/",
# scale, "m_", category, "/",
# file_name, ".zip"
# )
# }

# return(file_name)
# }

end # end module
# end # end module