Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planetary Computer 10m 3DEP Tiffs inconsistent with USGS Tiffs #399

Open
scottyhq opened this issue Dec 30, 2024 · 3 comments
Open

Planetary Computer 10m 3DEP Tiffs inconsistent with USGS Tiffs #399

scottyhq opened this issue Dec 30, 2024 · 3 comments

Comments

@scottyhq
Copy link

There are a number of related issues about inconsistent metadata, e.g #277

Here is a different case where the metadata (geotransform) seems OK (nearly identical), but tiff values are quite different compared to the source data.

This pipeline https://github.com/stactools-packages/threedep just deals with creating metadata, is the pipeline for COG creation also public? It's not obvious to me why these values would differ by several meters... It would be good in the pipeline to verify the values are equivalent.

# Illustration of discrepance between USGS & PC 10m 3DEP products
import xarray as xr
import rasterio

uri = 'https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/current/n39w107/USGS_13_n39w107.tif'
with rasterio.Env(GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR',
                  AWS_NO_SIGN_REQUEST='YES'):
    daUSGS = xr.open_dataarray(uri, engine='rasterio').squeeze().isel(x=slice(0, 256), y=slice(0, 256)).load()

uri = 'https://ai4edataeuwest.blob.core.windows.net/3dep/Elevation/13/TIFF/n39w107/USGS_13_n39w107.tif'
with rasterio.Env(GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR',
                  VSICURL_PC_URL_SIGNING='YES'):
    daPC = xr.open_dataarray(uri, engine='rasterio').squeeze().isel(x=slice(0, 256), y=slice(0, 256)).load()

xr.testing.assert_allclose(daUSGS, daPC)
AssertionError: Left and right DataArray objects are not close
Differing values:
L
    array([[3099.971 , 3104.6487, 3108.4382, ..., 3607.6433, 3609.839 , 3612.1143],
R
    array([[3105.152 , 3106.3528, 3108.317 , ..., 3617.3018, 3618.7676, 3619.058 ],
...
@777arc
Copy link
Contributor

777arc commented Dec 31, 2024

The STAC item you mentioned is part of the 3dep-seamless collection which are COGs copied straight from USGS, I believe. For reference, here is the list of 3DEP collections in PC and which ones are generated by PC vs copied from USGS:

  • 3dep-seamless - DEMs straight from USGS
  • 3dep-lidar-copc - Raw LIDAR data in COPC/LAZ format straight from USGS
  • Generated from 3dep-lidar-copc as part of PC internal pipeline:
    • 3dep-lidar-intensity
    • 3dep-lidar-returns
    • 3dep-lidar-classification
    • 3dep-lidar-pointsourceid
    • 3dep-lidar-dsm
    • 3dep-lidar-dtm
    • 3dep-lidar-hag
    • 3dep-lidar-dtm-native

The COG creation pipeline used for the derived products isn't public unfortunately, it's using PDAL though. But I think this specific spatial inconsistency is unrelated.

Looks like this isn't the first time the spatial extent bug was mentioned, see https://github.com/microsoft/planetary-computer-tasks/tree/main/datasets/usgs-lidar although this might only be specific to the generated collections, not 3dep-seamless

@ghidalgo3
Copy link

Another possibility we explored is that Planetary Computer is missing the latest version of the data from USGS.

Using Lidar Explorer, I looked up historical DEMs for the location you mentioned. I found these files:

https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/historical/n39w107/USGS_13_n39w107_20210312.tif
https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/historical/n39w107/USGS_13_n39w107_20211208.tif
https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/historical/n39w107/USGS_13_n39w107_20220331.tif

Looking at the "last modified" date on Planetary Computer's files, we copied them over from the USGS servers on 3/30/2021. I ran a quick gdalinfo on the 3 historical tifs from the USGS server and none of them exactly matched Planetary Computer's version, but maybe the data inside one of them does match when you put it through xarray?

@scottyhq
Copy link
Author

scottyhq commented Jan 2, 2025

Thanks for looking into this @ghidalgo3 @777arc.

Looking at the "last modified" date on Planetary Computer's files, we copied them over from the USGS servers on 3/30/2021

This is good to know. It'd be great to include that in STAC metadata going forward to better track provenance (not just for 3DEP, but it did also come up in the context of this past discussion #214)

but maybe the data inside one of them does match

You can also look at the source files via the S3 browser link https://prd-tnm.s3.amazonaws.com/index.html?prefix=StagedProducts/Elevation/13/TIFF/historical/n39w107/, and I'm seeing 'last modified' dates there of 2022-12-03. In any case, I did re-run the code above for those different versions and see the same mismatched values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants