Add more descriptive docs + some experiments (#108)

* Add more descriptive docs + some experiments * Update docs project to include experiment packages * Add more benchmark files (still raw) * Update apply return type docs Co-authored-by: Rafael Schouten <[email protected]> * Update docs/src/paradigms.md Co-authored-by: Rafael Schouten <[email protected]> * Update docs/src/paradigms.md Co-authored-by: Rafael Schouten <[email protected]> * Update docs/src/peculiarities.md * Add code for `orient` demo This is a dashboard that's meant to be run. Will add an animation later as well. * Add examples from issue * Add a true summary figure to the docs * Import the relevant Chairmarks/BenchmarkTools functions * Update Project.toml * Write the GeometryOps HackMD call notes to the docs As a hidden page for now but that can always change! * Add MultiFloats * Add NaturalEarth.jl devbranch when building docs * make Julia actually execute the code * Add Statistics, fix namespacing error * `geometry_providers.jl`: Remove redundancy, add comments * `vector_benchmark_plot.jl`: add a comment on top * rearrange file * Add warning that BoolsAsTypes are not public API --------- Co-authored-by: Rafael Schouten <[email protected]>
JuliaGeo · Apr 23, 2024 · 8f46d15 · 8f46d15
1 parent c671b34
commit 8f46d15
Show file tree

Hide file tree

Showing 9 changed files with 533 additions and 0 deletions.
diff --git a/.github/workflows/CI.yml b/.github/workflows/CI.yml
@@ -48,6 +48,8 @@ jobs:
       - uses: julia-actions/setup-julia@v1
         with:
           version: '1'
+      - name: Add custom versions of packages
+        run: julia --project=docs -e 'using Pkg; Pkg.add(PackageSpec(; url = "https://github.com/JuliaGeo/NaturalEarth.jl", rev = "as/scratchspaces"))'
       - uses: julia-actions/julia-buildpkg@v1
       - uses: julia-actions/julia-docdeploy@v1
         env:

diff --git a/benchmarks/geometry_providers.jl b/benchmarks/geometry_providers.jl
@@ -0,0 +1,182 @@
+#=
+# Geometry providers
+
+This file benchmarks GeometryOps methods on every GeoInterface.jl implementation we can find, in order to test:
+a. genericness, i.e., does GeometryOps work correctly with all GeoInterface.jl implementations?
+b. performance, i.e., how does GeometryOps compare to the native implementation?
+c. performance issues in the packages' implementations of GeoInterface
+=#
+
+# First, we import the providers:
+using ArchGDAL, LibGEOS, Shapefile, GeoJSON, WellKnownGeometry, GeometryBasics, GeoInterface, GeoFormatTypes
+PROVIDERS = (ArchGDAL, LibGEOS, GeometryBasics, GI.Wrappers)
+# Now, we import GeoInterface and GeometryOps,
+import GeometryOps as GO, GeoInterface as GI
+# Finally, we import some utility benchmarking, plotting and data munging packages!
+using BenchmarkTools, Chairmarks, CairoMakie, MakieThemes, DataFrames, Proj
+using CoordinateTransformations, Rotations
+
+
+# Polylabel.jl is a package that finds the "pole of inaccessibility" of a polygon,
+# i.e., the point within it that is furthest away from its boundaries.  
+
+# It depends on GeometryOps, but in this instance, we'll grab some of its test geometries
+# to use.
+import Polylabel
+
+# TODO: the reason we change to LibGEOS intermediately here is so that the 
+# linear rings of the WKG polygons are interpreted correctly.  Unfortunately 
+# that doesn't work when read, which there's an issue up for.
+water1 = GeoFormatTypes.WellKnownText(GeoFormatTypes.Geom(), readchomp(joinpath(dirname(dirname(pathof(Polylabel))), "test", "data", "water1.wkt")) |> String) |> x -> GI.convert(LibGEOS, x) |> GO.tuples
+water2 = GeoFormatTypes.WellKnownText(GeoFormatTypes.Geom(), readchomp(joinpath(dirname(dirname(pathof(Polylabel))), "test", "data", "water2.wkt")) |> String) |> x -> GI.convert(LibGEOS, x) |> GO.tuples
+# To fix these polygons is a complicated task, and even then LibGEOS gets it wrong:
+# water1 |> x -> LibGEOS.makeValid(GI.convert(LibGEOS, x)) |> GI.getgeom |> collect |> x -> filter(y -> GI.trait(y) isa Union{GI.PolygonTrait, GI.MultiPolygonTrait}, x) |> first |> GO.tuples # hide
+
+f, a, p = poly(water1; axis = (; title = "water1")); poly(f[1, 2], water2; axis = (; title = "water2")); f
+# Now, we rotate the `water1` polygon about its centroid, so we can use it to 
+# test the time it takes to intersect complex polygons:
+water1r = GO.transform(
+    Translation(GO.centroid(water1)) ∘ LinearMap(Makie.rotmatrix2d(π/2)) ∘ Translation((-).(GO.centroid(water1))), 
+    water1
+)
+f, a, p = poly(water1; label = "Original")
+poly!(water1r; label = "Rotated")
+axislegend(a)
+f
+# WARNING: does not work
+@b GO.union($(water1), $(water1r); target = GI.PolygonTrait()) seconds=3
+@b LibGEOS.union($(GI.convert(LibGEOS, water1)), $(GI.convert(LibGEOS, water1r))) seconds=3
+@b ArchGDAL.union($(GI.convert(ArchGDAL, water1)), $(GI.convert(ArchGDAL, water1r))) seconds=3
+
+poly(GO.union(w1g, w1rg; target = GI.PolygonTrait()))
+
+GI.getgeom(water1, 3) |> GI.trait
+
+# We can benchmark each provider and see if any of them have glaring issues.
+
+water1_centroid_suite = BenchmarkGroup()
+
+for provider in PROVIDERS
+    @info "Benchmarking $provider"
+    geom = GI.convert(provider, water1)
+    water1_centroid_suite[string(provider)] = @be GO.centroid($geom) seconds=3
+end
+
+
+# ## Tables.jl performance in `apply`
+#=
+This code checks how Tables.jl performs when using `apply`.
+We use two sources for this: `Shapefile.jl` and `DataFrames.jl`.
+More will be coming in the future!
+=#
+shp_file = "/Users/anshul/Downloads/ne_10m_admin_0_countries (1)/ne_10m_admin_0_countries.shp"
+table = Shapefile.Table(shp_file)
+go_df = DataFrame(table)
+go_df.geometry = GO.tuples(go_df.geometry);
+
+table_suite = BenchmarkGroup()
+
+
+ll2moll = Proj.Transformation("+proj=longlat +datum=WGS84", "+proj=moll")
+
+# First, we try reprojecting the geometries using Proj,
+reproject_suite = table_suite["reproject"] = BenchmarkGroup(["title:Reproject", "subtitle:All country borders from Natural Earth, 1:10m res."])
+
+reproject_suite["Shapefile.Table"] = @be GO.reproject($table, $ll2moll) seconds=3
+reproject_suite["DataFrame (Shapefile)"] = @be GO.reproject($(DataFrame(table)), $ll2moll) seconds=3
+reproject_suite["DataFrame (GO)"] = @be GO.reproject($(go_df), $ll2moll) seconds=3
+reproject_suite["Shapefile geoms"] = @be GO.reproject($(table.geometry), $ll2moll) seconds=3
+reproject_suite["GeometryOps geoms"] = @be GO.reproject($(GO.tuples(table.geometry)), $ll2moll) seconds=3
+
+# then transforming, just to see the difference in runtime
+# between calling out to C vs pure Julia,
+function _scaleby5(x)
+    return x .* 5
+end
+
+transform_suite = table_suite["transform"] = BenchmarkGroup(["title:Transform", "subtitle:All country borders from Natural Earth, 1:10m res."])
+transform_suite["Shapefile.Table"] = @be GO.transform($_scaleby5, $table) seconds=3
+transform_suite["DataFrame (Shapefile)"] = @be GO.transform($_scaleby5, $(DataFrame(table))) seconds=3
+transform_suite["DataFrame (GO)"] = @be GO.transform($_scaleby5, $(go_df)) seconds=3
+transform_suite["Shapefile geoms"] = @be GO.transform($_scaleby5, $(table.geometry)) seconds=3
+transform_suite["GeometryOps geoms"] = @be GO.transform($_scaleby5, $(GO.tuples(table.geometry))) seconds=3
+
+# and finally, calling `applyreduce` to find the area of each
+# polygon.
+area_suite = table_suite["area"] = BenchmarkGroup(["title:Area", "subtitle:All country borders from Natural Earth, 1:10m res."])
+
+area_suite["Shapefile.Table"] = @be GO.area($(table)) seconds=3
+area_suite["DataFrame (Shapefile)"] = @be GO.area($(DataFrame(table))) seconds=3
+area_suite["DataFrame (GO)"] = @be GO.area($(go_df)) seconds=3
+area_suite["Shapefile geoms"] = @be GO.area($(table.geometry)) seconds=3
+area_suite["GeometryOps geoms"] = @be GO.area($(GO.tuples(table.geometry))) seconds=3
+
+ts = getproperty.(area_suite["Shapefile.Table"].samples, :time)
+boxplot(ones(length(ts)), ts)
+violin(ones(length(ts)), ts; npoints = 3500, axis = (; yscale = log10,))
+
+
+# ## Plotting
+function Makie.convert_arguments(::Makie.PointBased, xs, bs::AbstractVector{<: Chairmarks.Benchmark})
+    ts = getproperty.(Statistics.mean.(bs), :time)
+    return (xs, ts)
+end
+
+function Makie.convert_arguments(::Makie.PointBased, bs::AbstractVector{<: Chairmarks.Benchmark})
+    ts = getproperty.(Statistics.mean.(bs), :time)
+    return (1:length(bs), ts)
+end
+
+function Makie.convert_arguments(::Makie.SampleBased, b::Chairmarks.Benchmark)
+    ts = getproperty.(b.samples, :time)
+    return (ones(length(ts)), ts)
+end
+
+function Makie.convert_arguments(::Makie.SampleBased, n::Number, b::Chairmarks.Benchmark)
+    ts = getproperty.(b.samples, :time)
+    return (fill(n, length(ts)), ts)
+end
+
+function Makie.convert_arguments(::Makie.SampleBased, labels::AbstractVector{<: AbstractString}, bs::AbstractVector{<: Chairmarks.Benchmark})
+    ts = map(b -> getproperty.(b.samples, :time), bs)
+    labels = 
+    return flatten
+end
+
+function Makie.convert_arguments(::Type{Makie.Errorbars}, xs, bs::AbstractVector{<: Chairmarks.Benchmark})
+    ts = map(b -> getproperty.(b.samples, :time), bs)
+    means = map(Statistics.mean, ts)
+    stds = map(Statistics.std, ts)
+    return (xs, ts)
+end
+
+ks = keys(area_suite) |> collect .|> identity
+
+bs = getindex.((area_suite,), ks)
+b_lengths = length.(getproperty.(bs, :samples))
+b_timing_flattened = collect(Iterators.flatten(Iterators.map(b -> getproperty.(b.samples, :time), bs)))
+k_strings = Iterators.flatten((fill(k, bl) for (k, bl) in zip(ks, b_lengths))) |> collect
+
+f = Figure()
+ax = Axis(f[1, 1];
+    convert_dim_1=Makie.CategoricalConversion(; sortby=nothing),
+)
+violin!(ax, k_strings, b_timing_flattened .|> log10)
+f
+ax.yscale = log10
+ax.xticklabelrotation = π/12
+f
+
+
+bs = values(area_suite) |> collect .|> identity
+labels = ["ST", "DS", "DG", "SG", "GG"]
+
+
+using AlgebraOfGraphics
+
+boxplot(b1)
+boxplot!.(1:5, values(area_suite) |> collect .|> identity)
+Makie.current_figure()
+Makie.current_axis().yscale = log10
+
+data((; x = labels, y = bs)) * mapping(:y => verbatim, :x, :y) * visual(BoxPlot) |> draw
diff --git a/benchmarks/vector_benchmark_plot.jl b/benchmarks/vector_benchmark_plot.jl
@@ -0,0 +1,125 @@
+#=
+# `vector-benchmark` result plot
+
+This code plots the results of the `kadyb/vector-benchmark` repository,
+and needs the MakieTeX SVG pr for now.
+
+The unique feature (and what takes up so many lines of code) is that 
+the scatter markers for each language are SVGs of the logo!  This 
+makes the plot eye-catching and allows users to quickly grasp language
+wise performance.
+
+Stepwise, here's what is going on:
+1. It loads the benchmark data from a CSV file into a DataFrame.
+2. It defines color and marker mappings for each package, where the markers are SVG logos of the respective programming languages.
+3. It uses the beeswarm function from the SwarmMakie package to create a scatter plot, where the x-axis represents the different benchmark tasks, and the y-axis represents the median execution time (in seconds) on a log scale.
+4. The scatter points are colored and marked according to the package and programming language, using the predefined color and marker mappings.
+5. It adds a legend to the plot, displaying the package names and their corresponding language logos.
+
+=#
+
+using CairoMakie, MakieTeX, SwarmMakie
+
+using CSV, DataFrames, CategoricalArrays
+using DataToolkit
+
+path_to_makietex_datatoml = joinpath(dirname(dirname(@__DIR__)), "MakieTeX", "docs", "Data.toml")
+data = DataToolkit.load(path_to_makietex_datatoml)
+
+
+using DataToolkit, DataFrames, StatsBase
+using CairoMakie, SwarmMakie #=beeswarm plots=#, Colors
+using MakieTeX # for SVG icons
+
+function svg_icon(name::String)
+    if name == "go"
+        icon = d"go-logo-solid::IO"
+    else
+        path = "svg/$name.svg"
+        icon = get(d"file-icons::Dict{String,IO}", path, nothing)
+    end
+    if isnothing(icon)
+        icon = get(d"file-icons-mfixx::Dict{String,IO}", path, nothing)
+    end
+    if isnothing(icon)
+        icon = get(d"file-icons-devopicons::Dict{String,IO}", path, nothing)
+    end
+    isnothing(icon) && return missing
+    return CachedSVG(read(seekstart(icon), String))
+end
+
+const colours_vibrant = range(LCHab(60,70,0), stop=LCHab(60,70,360), length=36)
+const colours_dim     = range(LCHab(25,50,0), stop=LCHab(25,50,360), length=36)
+
+const julia_logo = svg_icon("Julia")
+const r_logo = svg_icon("R")
+const python_logo = svg_icon("python")
+
+marker_map = Dict(
+    "geometryops" => julia_logo,
+    # "gdal-jl" => julia_logo,
+    "sf" => r_logo, 
+    "terra" => r_logo, 
+    "geos" => r_logo, 
+    "s2" => r_logo,
+    "geopandas" => python_logo,
+)
+
+
+color_map = Dict(
+    # R packages
+    "sf" => Makie.wong_colors()[1],
+    "s2" => Makie.wong_colors()[5],
+    "terra" => Makie.wong_colors()[6],
+    "geos" => Makie.wong_colors()[4],
+    # Python package
+    "geopandas" => Makie.wong_colors()[2],
+    # Julia package
+    "geometryops" => Makie.wong_colors()[3],
+)
+
+path_to_vector_benchmark = "/Users/anshul/git/vector-benchmark"
+timings_df = CSV.read(joinpath(path_to_vector_benchmark, "timings.csv"), DataFrame)
+replace!(timings_df.package, "sf-project" => "sf", "sf-transform" => "sf")
+
+# now plot
+
+task_ca = CategoricalArray(timings_df.task)
+
+group_marker = [MarkerElement(; color = color_map[package], marker = marker_map[package], markersize = 12) for package in keys(marker_map)]
+names_marker = collect(keys(marker_map))
+lang_markers = ["R" => r_logo, "Python" => python_logo, "Julia" => julia_logo]
+group_package = [MarkerElement(; marker, markersize = 12) for (lang, marker) in lang_markers]
+names_package = first.(lang_markers)
+
+
+f, a, p = beeswarm(
+    task_ca.refs, timings_df.median;
+    marker = getindex.((marker_map,), timings_df.package), 
+    color = getindex.((color_map,), timings_df.package),
+    markersize = 10,
+    axis = (;
+        xticks = (1:length(task_ca.pool.levels), task_ca.pool.levels),
+        xlabel = "Task",
+        ylabel = "Median time (s)",
+        yscale = log10,
+        title = "Benchmark vector operations",
+        xgridvisible = false,
+        xminorgridvisible = true,
+        yminorgridvisible = true,
+        yminorticks = IntervalsBetween(5),
+        ygridcolor = RGBA{Float32}(0.0f0,0.0f0,0.0f0,0.05f0),
+    )
+)
+leg = Legend(
+    f[1, 2],
+    [group_marker, group_package],
+    [names_marker, names_package],
+    ["Package", "Language"],
+    tellheight = false,
+    tellwidth = true,
+    gridshalign = :left,
+)
+resize!(f, 650, 450)
+a.spinewidth[] = 0.5
+f
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -1,4 +1,5 @@
 [deps]
+AccurateArithmetic = "22286c92-06ac-501d-9306-4abd417d9753"
 Base64 = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
 BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
 CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
@@ -10,6 +11,8 @@ DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
 Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 DocumenterVitepress = "4710194d-e776-4893-9690-8d956a29c365"
+DoubleFloats = "497a8b3b-efae-58df-a0af-a86822472b78"
+ExactPredicates = "429591f6-91af-11e9-00e2-59fbe8cec110"
 GeoDatasets = "ddc7317b-88db-5cb5-a849-8449e5df04f9"
 GeoInterface = "cf35fbd7-0cd7-5166-be24-54bfbe79505f"
 GeoInterfaceMakie = "0edc0954-3250-4c18-859d-ec71c1660c08"
@@ -20,7 +23,9 @@ LibGEOS = "a90b1aa1-3769-5649-ba7e-abc5a9d163eb"
 Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
 Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
 MakieThemes = "e296ed71-da82-5faf-88ab-0034a9761098"
+MultiFloats = "bdf0d083-296b-4888-a5b6-7498122e68a5"
 Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
 Proj = "c94c279d-25a6-4763-9509-64d165bea63e"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 Shapefile = "8e980c4a-a4fe-5da2-b3a7-4b4b0353a2f4"
+Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
diff --git a/docs/make.jl b/docs/make.jl
@@ -73,6 +73,9 @@ withenv("JULIA_DEBUG" => "Literate") do # allow Literate debug output to escape
     # TODO: We should probably fix the above in `process_literate_recursive!`.
 end
 
+# Now that the Literate stuff is done, we also download the call notes from HackMD:
+download("https://hackmd.io/kpIqAR8YRJOZQDJjUKVAUQ/download", joinpath(@__DIR__, "src", "call_notes.md"))
+
 # Finally, make the docs!
 makedocs(;
     modules=[GeometryOps],
@@ -91,6 +94,10 @@ makedocs(;
     pages=[
         "Introduction" => "introduction.md",
         "API Reference" => "api.md",
+        "Explanations" => [
+            "Paradigms" => "paradigms.md",
+            "Peculiarities" => "peculiarities.md",
+        ],
         "Source code" => literate_pages,
     ],
     warnonly = true,