Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dimension loss in aggregate with exact=TRUE and na.rm=TRUE #738

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

loreabad6
Copy link
Contributor

No description provided.

@edzer
Copy link
Member

edzer commented Feb 12, 2025

Can you provide an example when this is needed? Looking at the code, replacing NA with zeroes only makes sense when the function is sum, right?

@loreabad6
Copy link
Contributor Author

Of course! When trying an extraction with polygons and using exact = TRUE you get all NA values, so naturally na.rm = TRUE is needed but it was failing due to the loss of dimensions.

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
library(sf)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
r = read_stars(tif)
pnt = st_sample(st_as_sfc(st_bbox(r)), 10)

# Extraction on non-POINT geometries
poly = st_buffer(pnt, 1000)
st_extract(r, poly)
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                  Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#> L7_ETMs.tif  44.55272 60.83816 71.76914 71.40746 81.11012 108.5522
#> dimension(s):
#>          from to                     refsys point
#> geometry    1 10 SIRGAS 2000 / UTM zone 25S FALSE
#> band        1  6                         NA    NA
#>                                                                 values
#> geometry POLYGON ((291076.9 912073...,...,POLYGON ((291925.1 911756...
#> band                                                              NULL
st_extract(r, poly, exact = TRUE)
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>              Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> L7_ETMs.tif    NA      NA     NA  NaN      NA   NA   60
#> dimension(s):
#>          from to                     refsys point
#> geometry    1 10 SIRGAS 2000 / UTM zone 25S FALSE
#> band        1  6                         NA    NA
#>                                                                 values
#> geometry POLYGON ((291076.9 912073...,...,POLYGON ((291925.1 911756...
#> band                                                              NULL
st_extract(r, poly, exact = TRUE, na.rm = TRUE)
#> Error in dim.stars(x): length(d) == length(dim(x[[1]])) is not TRUE
st_extract(r, poly, exact = TRUE, FUN = sum, na.rm = TRUE)
#> Error in dim.stars(x): length(d) == length(dim(x[[1]])) is not TRUE

With the change we get:

st_extract(r, poly, exact = TRUE)
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>              Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> L7_ETMs.tif    NA      NA     NA  NaN      NA   NA   60
#> dimension(s):
#>          from to                     refsys point
#> geometry    1 10 SIRGAS 2000 / UTM zone 25S FALSE
#> band        1  6                         NA    NA
#>                                                                 values
#> geometry POLYGON ((291327.3 911290...,...,POLYGON ((292269.5 911186...
#> band                                                              NULL
st_extract(r, poly, exact = TRUE, na.rm = TRUE)
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                  Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#> L7_ETMs.tif  13.18559 55.44356 69.09618 65.69236 83.72314 107.6003
#> dimension(s):
#>          from to                     refsys point
#> geometry    1 10 SIRGAS 2000 / UTM zone 25S FALSE
#> band        1  6                         NA    NA
#>                                                                 values
#> geometry POLYGON ((291327.3 911290...,...,POLYGON ((292269.5 911186...
#> band                                                              NULL
st_extract(r, poly, exact = TRUE, FUN = sum, na.rm = TRUE)
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                  Min.  1st Qu. Median     Mean  3rd Qu.     Max.
#> L7_ETMs.tif  32825.58 155454.6 212844 211520.2 273478.1 400403.8
#> dimension(s):
#>          from to                     refsys point
#> geometry    1 10 SIRGAS 2000 / UTM zone 25S FALSE
#> band        1  6                         NA    NA
#>                                                                 values
#> geometry POLYGON ((291327.3 911290...,...,POLYGON ((292269.5 911186...
#> band                                                              NULL

@edzer
Copy link
Member

edzer commented Feb 12, 2025

Fair enough, at least that gives answers, even if wrong. I wondered whether the answers are correct for FUN=mean (the dominant use case and efault), as NA values are replaced here by zeroes.

@loreabad6
Copy link
Contributor Author

loreabad6 commented Feb 12, 2025

Most likely they wouldn't be correct. If we compare with and without exact=TRUE there is a slight shift toward lower values.
By the way, na.rm=TRUE does not seem to have any effect when exact=FALSE. I was investigating this earlier but paused it for now.

st_extract(r, poly)
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                  Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#> L7_ETMs.tif  12.01004 65.21413 72.92868 71.03802 84.39935 115.6306
#> dimension(s):
#>          from to                     refsys point
#> geometry    1 10 SIRGAS 2000 / UTM zone 25S FALSE
#> band        1  6                         NA    NA
#>                                                                 values
#> geometry POLYGON ((299367.8 911434...,...,POLYGON ((298113.9 911924...
#> band                                                              NULL
st_extract(r, poly, exact = TRUE, na.rm = TRUE)
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                  Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#> L7_ETMs.tif  11.94298 64.73291 72.41612 70.58727 83.89617 114.9979
#> dimension(s):
#>          from to                     refsys point
#> geometry    1 10 SIRGAS 2000 / UTM zone 25S FALSE
#> band        1  6                         NA    NA
#>                                                                 values
#> geometry POLYGON ((299367.8 911434...,...,POLYGON ((298113.9 911924...
#> band                                                              NULL

@edzer edzer merged commit 3c268ac into r-spatial:main Feb 13, 2025
7 checks passed
@edzer
Copy link
Member

edzer commented Feb 13, 2025

I believe it works, probably because weights from exactextractr are zero outside the area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants