Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to st_extract() for time matching workflows #737

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

loreabad6
Copy link
Contributor

This PR has these main proposals:

  1. Allow st_extract() with time matching work on non-POINT geometries
  2. Allow to pass stars objects with a geometry dimension (vector geometry data cube) in the at parameter (for time matching purposes)
  3. Allow to pass a stars object with geometry attributes to perform aggregation on those geometries, with time matching

Minor additions

  1. Added an example for st_extract() with time matching for POINT geometries

Main drawbacks:

  1. Performance: depending on the size of the stars object in x or the size of the stars object and the attribute geometries in at, the aggregation may be very slow
  2. Interpolation: I did not test, but decided to avoid time interpolation for non-POINT geometries for now
  3. Terminology: having geometries in the attributes and in the dimensions becomes confusing in terms of terminology. I now refer to "attribute geometry" and "dimension geometry", hence, as a parameter to define the "attribute geometry" during extraction, I came up with sfc_attribute, but I am not sure if this is intuitive or if some other term should be used

Examples of the implementation (added also to the docs):

library(stars)
library(sf)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
r = read_stars(tif)
pnt = st_sample(st_as_sfc(st_bbox(r)), 10)

# Extraction on non-POINT geometries
poly = st_buffer(pnt, 1000)
st_extract(r, poly)
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                  Min.  1st Qu.   Median    Mean  3rd Qu.     Max.
#> L7_ETMs.tif  12.53551 60.83467 71.35355 69.6446 82.46561 111.2731
#> dimension(s):
#>          from to                     refsys point
#> geometry    1 10 SIRGAS 2000 / UTM zone 25S FALSE
#> band        1  6                         NA    NA
#>                                                                 values
#> geometry POLYGON ((293104.6 911155...,...,POLYGON ((299067.8 912063...
#> band                                                              NULL

# Extraction with time matching
rdate = c(r, r*2, along = "date")
dates = c(Sys.Date()-1, Sys.Date())
rdate = st_set_dimensions(rdate, "date", values = c(dates))

pntsf = st_sf(date = dates, geometry = pnt)
st_extract(split(rdate, "band"), pntsf, time_column = "date") # POINT geometries
#> Simple feature collection with 10 features and 7 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 291095.7 ymin: 9111551 xmax: 298358 ymax: 9120631
#> Projected CRS: SIRGAS 2000 / UTM zone 25S
#>     X1  X2  X3  X4  X5  X6       date                 geometry
#> 1   77  64  71  59 123 103 2025-02-11 POINT (292104.6 9111551)
#> 2  178 156 162 144 226 156 2025-02-12 POINT (293267.4 9115068)
#> 3   90  81  53  12  13  12 2025-02-11   POINT (296416 9112607)
#> 4  160 130 132 116 194 150 2025-02-12 POINT (291969.3 9111688)
#> 5   91  76  91  60 137 120 2025-02-11 POINT (297487.9 9118260)
#> 6  198 180 128  30  28  24 2025-02-12   POINT (298358 9111923)
#> 7   76  60  66  48 113  96 2025-02-11 POINT (291095.7 9115184)
#> 8  124 104 100 146 180 122 2025-02-12 POINT (292453.5 9118373)
#> 9   75  59  65  61 110  89 2025-02-11 POINT (293342.5 9113729)
#> 10 166 152 158 162 238 182 2025-02-12 POINT (298067.8 9120631)

polysf = st_buffer(pntsf, 1000)
st_extract(split(rdate, "band"), polysf, time_column = "date") # POLYGON geometries
#> Simple feature collection with 10 features and 7 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 290095.7 ymin: 9110551 xmax: 299358 ymax: 9121631
#> Projected CRS: SIRGAS 2000 / UTM zone 25S
#>           X1        X2        X3        X4        X5        X6       date
#> 1   82.53757  67.94027  71.48459  56.33730 103.24486  82.44162 2025-02-11
#> 2  160.15691 138.08234 141.55153 139.09632 207.04609 149.29052 2025-02-12
#> 3   90.40538  80.29757  64.38004  24.84902  32.75362  26.65331 2025-02-11
#> 4  165.18638 135.86535 142.44502 112.28334 204.83910 163.38965 2025-02-12
#> 5   88.78499  77.56171  83.10039  59.71695 101.63519  81.54825 2025-02-11
#> 6  196.06179 180.88494 132.64844  26.91690  27.03977  25.07102 2025-02-12
#> 7   78.22122  65.55472  67.76636  61.20724 106.63338  81.86106 2025-02-11
#> 8  128.03567 104.13130  87.57250 154.82140 156.04911  86.54381 2025-02-12
#> 9   84.37730  71.99897  76.71632  62.83088 111.27308  87.67080 2025-02-11
#> 10 143.25937 117.99595 107.46302 137.52178 136.29990  86.23506 2025-02-12
#>                          geometry
#> 1  POLYGON ((293104.6 9111551,...
#> 2  POLYGON ((294267.4 9115068,...
#> 3  POLYGON ((297416 9112607, 2...
#> 4  POLYGON ((292969.3 9111688,...
#> 5  POLYGON ((298487.9 9118260,...
#> 6  POLYGON ((299358 9111923, 2...
#> 7  POLYGON ((292095.7 9115184,...
#> 8  POLYGON ((293453.5 9118373,...
#> 9  POLYGON ((294342.5 9113729,...
#> 10 POLYGON ((299067.8 9120631,...

vdc = st_sf(rdm = rnorm(20), polygons = st_buffer(st_sample(st_bbox(pnt), 20), 500),
            geometry = rep(pnt, 2), date = rep(dates, each = 10)) |> 
    st_as_stars(dims = c("geometry", "date"))

(vdc_new = st_extract(split(rdate, "band"), vdc, time_column = "date")) # stars vector data cube
#> stars object with 2 dimensions and 6 attributes
#> attribute(s):
#>     Min. 1st Qu. Median   Mean 3rd Qu. Max.
#> X1    62   82.25  111.5 123.30   161.5  198
#> X2    52   73.25   97.0 105.15   135.5  180
#> X3    50   66.00   95.5 102.90   132.0  182
#> X4    12   55.50   72.5  80.85   118.5  162
#> X5    13   95.25  121.0 139.35   221.5  274
#> X6    12   71.50   99.5 110.55   161.5  240
#> dimension(s):
#>          from to     offset  delta                     refsys point
#> geometry    1 10         NA     NA SIRGAS 2000 / UTM zone 25S  TRUE
#> date        1  2 2025-02-11 1 days                       Date    NA
#>                                                         values
#> geometry POINT (292104.6 9111551),...,POINT (298067.8 9120631)
#> date                                                      NULL
merge(vdc_new, name = "band")
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>                    Min. 1st Qu. Median   Mean 3rd Qu. Max.
#> X1.X2.X3.X4.X5.X6    12   69.75  101.5 110.35     152  274
#> dimension(s):
#>          from to     offset  delta                     refsys point
#> geometry    1 10         NA     NA SIRGAS 2000 / UTM zone 25S  TRUE
#> date        1  2 2025-02-11 1 days                       Date    NA
#> band        1  6         NA     NA                         NA    NA
#>                                                         values
#> geometry POINT (292104.6 9111551),...,POINT (298067.8 9120631)
#> date                                                      NULL
#> band                                                 X1,...,X6

(vdc_new2 = st_extract(split(rdate, "band"), vdc,
                       time_column = "date", sfc_attribute = "polygons")) # stars vector data cube
#> stars object with 2 dimensions and 6 attributes
#> attribute(s):
#>         Min.  1st Qu.    Median      Mean  3rd Qu.     Max.
#> X1  65.28822 81.43757 114.21923 120.29934 157.5105 185.9793
#> X2  53.61880 69.96624  98.44209 103.58648 136.1219 166.8657
#> X3  46.74070 69.57722  89.88534  99.15060 130.4103 151.1304
#> X4  13.31851 60.57830  74.05230  89.85701 132.0244 162.4108
#> X5  13.17684 89.29392 111.38950 128.37626 191.2518 220.3734
#> X6  12.04964 62.15886  88.00427  93.65377 143.7670 169.8050
#> dimension(s):
#>          from to     offset  delta                     refsys point
#> geometry    1 10         NA     NA SIRGAS 2000 / UTM zone 25S  TRUE
#> date        1  2 2025-02-11 1 days                       Date    NA
#>                                                         values
#> geometry POINT (292104.6 9111551),...,POINT (298067.8 9120631)
#> date                                                      NULL
merge(vdc_new2, name = "band")
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>                        Min.  1st Qu.  Median     Mean  3rd Qu.     Max.
#> X1.X2.X3.X4.X5.X6  12.04964 69.57722 93.6122 105.8206 143.3084 220.3734
#> dimension(s):
#>          from to     offset  delta                     refsys point
#> geometry    1 10         NA     NA SIRGAS 2000 / UTM zone 25S  TRUE
#> date        1  2 2025-02-11 1 days                       Date    NA
#> band        1  6         NA     NA                         NA    NA
#>                                                         values
#> geometry POINT (292104.6 9111551),...,POINT (298067.8 9120631)
#> date                                                      NULL
#> band                                                 X1,...,X6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant