From 2b92ce57299f4109686a1c1921d6be6aaa452a24 Mon Sep 17 00:00:00 2001 From: Trey Stafford Date: Wed, 30 Oct 2024 10:05:39 -0600 Subject: [PATCH] Add notebook showing example usage of iceflow with icepyx --- notebooks/iceflow-with-icepyx.ipynb | 437 ++++++++++++++++++++++++++++ 1 file changed, 437 insertions(+) create mode 100644 notebooks/iceflow-with-icepyx.ipynb diff --git a/notebooks/iceflow-with-icepyx.ipynb b/notebooks/iceflow-with-icepyx.ipynb new file mode 100644 index 0000000..07e60bd --- /dev/null +++ b/notebooks/iceflow-with-icepyx.ipynb @@ -0,0 +1,437 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7174387c-05aa-4f8f-b3bd-902d4f635d9b", + "metadata": {}, + "source": [ + "# Using iceflow with icepyx\n", + "\n", + "This notebook shows how to compare iceflow-supported datasets with ICESat2 data using [icepyx](https://github.com/icesat2py/icepyx).\n", + "\n", + "If this is your first time using `iceflow`, we recommend starting with the [NSIDC Iceflow example Jupyter Notebook](https://iceflow.readthedocs.io/en/latest/iceflow-example.html) first.\n", + "\n", + "Similarly, if you are new to `icepyx`, we suggest reviewing the [icepyx documentation](https://icepyx.readthedocs.io/en/latest/) for more information about how to use `icepyx`." + ] + }, + { + "cell_type": "markdown", + "id": "64fb8fa5-e304-4607-9bad-33d3832cd70b", + "metadata": {}, + "source": [ + "## Scenario: assessing ice surface elevation change near Sermeq Kujalleq (Jakobshavn Isbrae)\n", + "\n", + "In this notebook, we will focus on a small area near Sermeq Kujalleq (Jakobshavn Isbrae).\n", + "\n", + "The steps this notebook will take are:\n", + "\n", + "* Search for and download iceflow-supported data for our area of interest and timeframe.\n", + "* Search for and download ICESat2 data using `icepyx`\n", + "* Average elevation data over our area of interest at a weekly resolution\n", + "* Plot the results as a timeseries\n", + "\n", + "But first, we setup our environment and define some constants that will be used for the rest of the notebook:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2ad99b0d-4e15-466c-b9aa-7bfa75ef5f26", + "metadata": {}, + "outputs": [], + "source": [ + "# Imports\n", + "from pathlib import Path\n", + "import datetime as dt\n", + "\n", + "import dask.dataframe as dd\n", + "import icepyx as ipx\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import pandas as pd\n", + "import xarray as xr\n", + "\n", + "from nsidc.iceflow.api import fetch_iceflow_df, create_iceflow_parquet\n", + "from nsidc.iceflow.data import ALL_DATASETS\n", + "from nsidc.iceflow.data.models import DatasetSearchParameters, BoundingBox, IceflowDataFrame" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d7143806-d23d-421a-8b7a-3ca15b05f7cf", + "metadata": {}, + "outputs": [], + "source": [ + "# Constants\n", + "\n", + "# This bounding box covers an area near Sermeq Kujalleq (Jakobshavn Isbrae)\n", + "BBOX = BoundingBox(lower_left_lon=-49.149, lower_left_lat=69.186, upper_right_lon=-48.949, upper_right_lat=69.238)\n", + "\n", + "# Range of dates we want to evaluate\n", + "DATE_RANGE = (dt.date(2007, 1, 1), dt.date(2024, 10, 28))\n", + "\n", + "# All of our data will be downloaded to this location. \n", + "# NOTE: the data supported by iceflow and icepyx can be very large!\n", + "OUTPUT_DIR = Path(\"./downloaded-data/\")\n", + "\n", + "# ICESat2 data products use ITRF2014 (e.g., see https://nsidc.org/data/atl06/versions/6):\n", + "# > WGS 84 ellipsoid, ITRF2014 reference frame\n", + "# NOTE/TODO: This is expected to change in the near future! ICESat2 release\n", + "# 7, scheduled for spring 2025, is expected to be referenced to ITRF2020.\n", + "ICESAT2_ITRF = \"ITRF2014\"" + ] + }, + { + "cell_type": "markdown", + "id": "037235e4-598d-4870-bbcc-c29a9f5ee173", + "metadata": {}, + "source": [ + "### Search for and download iceflow-supported data for our area of interest and timeframe.\n", + "\n", + "`iceflow`'s API provides a function `create_iceflow_parquet` that:\n", + "\n", + "* Finds data matching our `BBOX`, `DATE_RANGE`, and desired datasets\n", + "* Downloads and reads the data from the datasets' native formats\n", + "* Transforms all of the lat/lon/elev data into a target ITRF\n", + "* Writes out the lat/lon/elev data to a parquet dataset that can be be read by e.g., `dask` for further processing. \n", + "\n", + "Writing data to a parquet dataset allows `dask` (which we will use later!) to read the chunks of data it needs to do calculations (e.g., `mean`) without needing to read all of the data into memory at once. This is important because `iceflow` can find many millions of data points for even small areas of interest!\n", + "\n", + "Note: This next step may take a while, and will download approximately [TODO AMOUNT] of data to your local disk." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "713b93dd-d23b-422b-92dc-f9ed364a566c", + "metadata": {}, + "outputs": [], + "source": [ + "parquet_path = create_iceflow_parquet(\n", + " dataset_search_params=DatasetSearchParameters(\n", + " # `ALL_DATASETS` is a list that contains all of the datasets supported by iceflow.\n", + " # This lets us find all the data iceflow supports for our area of interest.\n", + " datasets=ALL_DATASETS,\n", + " bounding_box=BBOX,\n", + " temporal=DATE_RANGE,\n", + " ),\n", + " target_itrf=ICESAT2_ITRF,\n", + " output_dir=OUTPUT_DIR,\n", + " overwrite=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e29f9cf9", + "metadata": {}, + "outputs": [], + "source": [ + "# Now, read the data stored in the parquet dataset using dask\n", + "\n", + "iceflow_df = dd.read_parquet(parquet_path)\n", + "\n", + "# Ensure that our index is set as a datetime object \n", + "iceflow_df = iceflow_df.reset_index()\n", + "iceflow_df[\"utc_datetime\"] = dd.to_datetime(iceflow_df[\"utc_datetime\"])\n", + "iceflow_df = iceflow_df.set_index(\"utc_datetime\")\n", + "\n", + "iceflow_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "925eede5-e9be-4793-9b1d-9c5562c73b7c", + "metadata": {}, + "source": [ + "# Search for and download ICESat2 data using `icepyx`\n", + "\n", + "Next, we will use icepyx to find [ATL06](https://nsidc.org/data/atl06/versions/6) data for the same area of interest and timeframe. \n", + "\n", + "To learn more about `icepyx.Query`, which is used below, see the [documentation](https://icepyx.readthedocs.io/en/latest/_icepyx/icepyx.Query.html#icepyx.Query)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bda00a2c-51dd-4d01-8ba4-4dcb0985931d", + "metadata": {}, + "outputs": [], + "source": [ + "# We will compare the ILATM1B data to ATL06 data from October 2018.\n", + "result = ipx.Query(\n", + " \"ATL06\",\n", + " [\n", + " BBOX.lower_left_lon,\n", + " BBOX.lower_left_lat,\n", + " BBOX.upper_right_lon,\n", + " BBOX.upper_right_lat,\n", + " ],\n", + " DATE_RANGE,\n", + ")\n", + "result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0c146fa2-fdb9-4774-a6bf-481e2498948a", + "metadata": {}, + "outputs": [], + "source": [ + "# Download the results. Note that this may take a while and download [TODO AMOUNT] to local disk.\n", + "result.download_granules(\"downloaded-data/ATL06/\")" + ] + }, + { + "cell_type": "markdown", + "id": "d12d21ed-0774-4f58-b75e-56f2734d27a6", + "metadata": {}, + "source": [ + "Next, we will use [icepyx.Read](https://icepyx.readthedocs.io/en/latest/_icepyx/icepyx.Read.html#icepyx.Read) to read the data into an [xarray Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html).\n", + "\n", + "Note that the code below wraps the reading of each file in a `try/except` block because of a known issue with subsetting. See https://github.com/icesat2py/icepyx/issues/576 for more information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "081376b5-308b-43c3-bc6d-b9c4342586d0", + "metadata": {}, + "outputs": [], + "source": [ + "datasets = []\n", + "for file in Path(\"downloaded-data/ATL06/\").glob(\"*.h5\"):\n", + " try:\n", + " reader = ipx.Read(str(file))\n", + " reader.vars.append(var_list=[\"h_li\", \"latitude\", \"longitude\"])\n", + " ds = reader.load()\n", + " datasets.append(ds)\n", + " except:\n", + " print(f\"{file=} contains an error and will not be read\")\n", + " continue\n", + "\n", + "len(datasets)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc7fc4b2-37ee-4f44-a21e-d6225eccc0f8", + "metadata": {}, + "outputs": [], + "source": [ + "ds = xr.concat(datasets, dim=\"gran_idx\")\n", + "ds" + ] + }, + { + "cell_type": "markdown", + "id": "31d5cf8f-750e-4e9d-9cda-431d0a1a6a61", + "metadata": {}, + "source": [ + "`icepyx` reads ICESat2 data as an xarray dataset. `xarray` is a powerful tool and the data is ready to use in this format, but to simplify things for this notebook and make the data more compatible with `iceflow`, the next step will convert the data into an `iceflow`-compatible [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "09a51188-be09-486d-be10-a7bede9d02ce", + "metadata": {}, + "outputs": [], + "source": [ + "# Rename vars to be consistent with how\n", + "# `iceflow` identifies the elevation and time fields\n", + "ds = ds.rename({\"h_li\": \"elevation\", \"delta_time\": \"utc_datetime\"})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "422d0a32-b676-4fe2-89ba-14f31bba6e80", + "metadata": {}, + "outputs": [], + "source": [ + "# Convert to an iceflow dataframe\n", + "spot_dfs = []\n", + "for spot in ds.spot.data.flatten():\n", + " df = pd.DataFrame.from_dict(\n", + " {\n", + " \"elevation\": ds.sel(spot=spot).elevation.data.flatten(),\n", + " \"latitude\": ds.sel(spot=spot).latitude.data.flatten(),\n", + " \"longitude\": ds.sel(spot=spot).longitude.data.flatten(),\n", + " \"utc_datetime\": ds.sel(spot=spot).utc_datetime.data.flatten(),\n", + " \"spot\": [spot] * len(ds.sel(spot=spot).elevation.data.flatten()),\n", + " \"ITRF\": ICESAT2_ITRF,\n", + " },\n", + " )\n", + " spot_dfs.append(df)\n", + "\n", + "\n", + "df = pd.concat(spot_dfs, sort=True)\n", + "# Drop rows where lat, lon, or elev are missing.\n", + "df = df.dropna(subset=[\"latitude\", \"longitude\", \"elevation\"], how=\"any\")\n", + "df = df.set_index(\"utc_datetime\")\n", + "# Cast the df as an IceflowDataFrame. \n", + "# The `atl06_df` can now be used with e.g., `iceflow`'s ITRF conversion function \n", + "# to perform plate motion model adjustments if desired.\n", + "atl06_df = IceflowDataFrame(df)\n", + "atl06_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2200a3ff-3cac-441d-a4ac-09ef25fd1c66", + "metadata": {}, + "outputs": [], + "source": [ + "# The ATL06 data contains some negative elevation values\n", + "atl06_df.elevation.min()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "826ace9e-0638-4240-8769-5b0a1385b173", + "metadata": {}, + "outputs": [], + "source": [ + "# Filter out negative values. We expect positive elevations.\n", + "atl06_df = atl06_df[atl06_df.elevation > 0]" + ] + }, + { + "cell_type": "markdown", + "id": "c3c8966f-482c-4ecc-91a5-c6dce34f37de", + "metadata": {}, + "source": [ + "### Average elevation data over our area of interest at a weekly resolution\n", + "\n", + "In this next step, we will resample the ATL06 and `iceflow` data to a weekly resolution, taking the mean of elevation over our area of interest. This will provide us with one data point per week where data is available, giving us a general idea of how the elevation of our area of interest changes over time." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fb60aa6c-4b9d-4592-95be-4687636ccd1b", + "metadata": {}, + "outputs": [], + "source": [ + "# resample the ATL06 data to a weekly mean.\n", + "atl06_avg_df = atl06_df[[\"elevation\"]].resample(\"W\").mean()\n", + "atl06_avg_df = atl06_avg_df.dropna(how=\"any\")\n", + "atl06_avg_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5a76ba1-d2b1-49fd-96f7-bf41264c26a6", + "metadata": {}, + "outputs": [], + "source": [ + "# resample the ATL06 data to a weekly mean.\n", + "iceflow_df_sampled = iceflow_df.repartition(freq=\"1W\")\n", + "iceflow_df_sampled = iceflow_df_sampled.dropna(how=\"any\")\n", + "\n", + "iceflow_df_sampled = iceflow_df_sampled[iceflow_df_sampled.elevation > 0]\n", + "\n", + "iceflow_avg = iceflow_df_sampled.resample(\"W\").agg({\n", + " \"elevation\": \"mean\",\n", + " \"dataset\": lambda x: \", \".join(x.astype(\"str\").unique()), \n", + "})\n", + "iceflow_avg = iceflow_avg.replace([np.inf, -np.inf], np.nan) \n", + "iceflow_avg = iceflow_avg.dropna(how=\"any\")\n", + "iceflow_avg = iceflow_avg.compute()\n", + "\n", + "iceflow_avg" + ] + }, + { + "cell_type": "markdown", + "id": "4a530b6e-2474-400e-8f04-a5665587591b", + "metadata": {}, + "source": [ + "### Plot the results as a timeseries\n", + "\n", + "Now we will use [matplotlib](https://matplotlib.org/) to plot the results as a timeseries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5e733fe6-cc28-4696-aadf-272fe755ad2c", + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "# We will use a unique marker for each dataset\n", + "iceflow_marker_map = {\n", + " \"GLAH06v034\": \"s\",\n", + " \"BLATM1Bv1\": \"D\",\n", + " \"ILATM1Bv1\": \"x\",\n", + " \"ILATM1Bv2\": \"o\",\n", + " \"ILVIS2v1\": \"v\",\n", + " \"ILVIS2v2\": \"^\",\n", + "}\n", + "\n", + "fig = plt.figure(figsize=(10, 8))\n", + "\n", + "for dataset, marker in iceflow_marker_map.items():\n", + " subset = iceflow_avg[iceflow_avg.dataset == dataset]\n", + " if subset.elevation.any():\n", + " plt.scatter(subset.index, subset.elevation, marker=marker, label=dataset, linestyle=\"\", color=\"black\")\n", + "\n", + "plt.scatter(atl06_avg_df.index, atl06_avg_df.elevation, color=\"black\", marker=\"*\", label=\"ATL06v6\", linestyle=\"\")\n", + "\n", + "plt.xlabel(\"Date\")\n", + "plt.ylabel(\"Elevation\")\n", + "plt.legend(title=\"Dataset\")" + ] + }, + { + "cell_type": "markdown", + "id": "33313a43-93e9-4391-b5c8-db85008f2cae", + "metadata": {}, + "source": [ + "## Conclusions\n", + "\n", + "In this notebook, we found and analyzed laser altimetry data from a variety of datasets using `iceflow` and `icepyx`. \n", + "\n", + "In the timeseries plot above, we can see how the surface elevation of a small area near Sermeq Kujalleq (Jakobshavn Isbrae) changes over time.\n", + "\n", + "Note that this analysis was relatively simple. Although the data and plot above give us an idea of surface elevation changes, it should be noted that there are still a number of things a researcher should consider when doing an analysis across many datasets over time. To further this analysis, we may want to consider doing one or more of the following:\n", + "\n", + "* Outlier detection and filtering\n", + "* Cross-calibration of data between sensors/datasets\n", + "* Plate motion model coordinate propagation (see the [NSIDC Iceflow example Jupyter Notebook](https://iceflow.readthedocs.io/en/latest/iceflow-example.html) for an example of how to do this)\n", + "* Account for spatial variability within our region of interest" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}