From 2b92ce57299f4109686a1c1921d6be6aaa452a24 Mon Sep 17 00:00:00 2001
From: Trey Stafford <trey.stafford@colorado.edu>
Date: Wed, 30 Oct 2024 10:05:39 -0600
Subject: [PATCH] Add notebook showing example usage of iceflow with icepyx

---
 notebooks/iceflow-with-icepyx.ipynb | 437 ++++++++++++++++++++++++++++
 1 file changed, 437 insertions(+)
 create mode 100644 notebooks/iceflow-with-icepyx.ipynb

diff --git a/notebooks/iceflow-with-icepyx.ipynb b/notebooks/iceflow-with-icepyx.ipynb
new file mode 100644
index 0000000..07e60bd
--- /dev/null
+++ b/notebooks/iceflow-with-icepyx.ipynb
@@ -0,0 +1,437 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7174387c-05aa-4f8f-b3bd-902d4f635d9b",
+   "metadata": {},
+   "source": [
+    "# Using iceflow with icepyx\n",
+    "\n",
+    "This notebook shows how to compare iceflow-supported datasets with ICESat2 data using [icepyx](https://github.com/icesat2py/icepyx).\n",
+    "\n",
+    "If this is your first time using `iceflow`, we recommend starting with the [NSIDC Iceflow example Jupyter Notebook](https://iceflow.readthedocs.io/en/latest/iceflow-example.html) first.\n",
+    "\n",
+    "Similarly, if you are new to `icepyx`, we suggest reviewing the [icepyx documentation](https://icepyx.readthedocs.io/en/latest/) for more information about how to use `icepyx`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64fb8fa5-e304-4607-9bad-33d3832cd70b",
+   "metadata": {},
+   "source": [
+    "## Scenario: assessing ice surface elevation change near Sermeq Kujalleq (Jakobshavn Isbrae)\n",
+    "\n",
+    "In this notebook, we will focus on a small area near Sermeq Kujalleq (Jakobshavn Isbrae).\n",
+    "\n",
+    "The steps this notebook will take are:\n",
+    "\n",
+    "* Search for and download iceflow-supported data for our area of interest and timeframe.\n",
+    "* Search for and download ICESat2 data using `icepyx`\n",
+    "* Average elevation data over our area of interest at a weekly resolution\n",
+    "* Plot the results as a timeseries\n",
+    "\n",
+    "But first, we setup our environment and define some constants that will be used for the rest of the notebook:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2ad99b0d-4e15-466c-b9aa-7bfa75ef5f26",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Imports\n",
+    "from pathlib import Path\n",
+    "import datetime as dt\n",
+    "\n",
+    "import dask.dataframe as dd\n",
+    "import icepyx as ipx\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import xarray as xr\n",
+    "\n",
+    "from nsidc.iceflow.api import fetch_iceflow_df, create_iceflow_parquet\n",
+    "from nsidc.iceflow.data import ALL_DATASETS\n",
+    "from nsidc.iceflow.data.models import DatasetSearchParameters, BoundingBox, IceflowDataFrame"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d7143806-d23d-421a-8b7a-3ca15b05f7cf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Constants\n",
+    "\n",
+    "# This bounding box covers an area near Sermeq Kujalleq (Jakobshavn Isbrae)\n",
+    "BBOX = BoundingBox(lower_left_lon=-49.149, lower_left_lat=69.186, upper_right_lon=-48.949, upper_right_lat=69.238)\n",
+    "\n",
+    "# Range of dates we want to evaluate\n",
+    "DATE_RANGE = (dt.date(2007, 1, 1), dt.date(2024, 10, 28))\n",
+    "\n",
+    "# All of our data will be downloaded to this location. \n",
+    "# NOTE: the data supported by iceflow and icepyx can be very large!\n",
+    "OUTPUT_DIR =  Path(\"./downloaded-data/\")\n",
+    "\n",
+    "# ICESat2 data products use ITRF2014 (e.g., see https://nsidc.org/data/atl06/versions/6):\n",
+    "# > WGS 84 ellipsoid, ITRF2014 reference frame\n",
+    "# NOTE/TODO: This is expected to change in the near future! ICESat2 release\n",
+    "# 7, scheduled for spring 2025, is expected to be referenced to ITRF2020.\n",
+    "ICESAT2_ITRF = \"ITRF2014\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "037235e4-598d-4870-bbcc-c29a9f5ee173",
+   "metadata": {},
+   "source": [
+    "### Search for and download iceflow-supported data for our area of interest and timeframe.\n",
+    "\n",
+    "`iceflow`'s API provides a function `create_iceflow_parquet` that:\n",
+    "\n",
+    "* Finds data matching our `BBOX`, `DATE_RANGE`, and desired datasets\n",
+    "* Downloads and reads the data from the datasets' native formats\n",
+    "* Transforms all of the lat/lon/elev data into a target ITRF\n",
+    "* Writes out the lat/lon/elev data to a parquet dataset that can be be read by e.g., `dask` for further processing. \n",
+    "\n",
+    "Writing data to a parquet dataset allows `dask` (which we will use later!) to read the chunks of data it needs to do calculations (e.g., `mean`) without needing to read all of the data into memory at once. This is important because `iceflow` can find many millions of data points for even small areas of interest!\n",
+    "\n",
+    "Note: This next step may take a while, and will download approximately [TODO AMOUNT] of data to your local disk."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "713b93dd-d23b-422b-92dc-f9ed364a566c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "parquet_path = create_iceflow_parquet(\n",
+    "    dataset_search_params=DatasetSearchParameters(\n",
+    "        # `ALL_DATASETS` is a list that contains all of the datasets supported by iceflow.\n",
+    "        # This lets us find all the data iceflow supports for our area of interest.\n",
+    "        datasets=ALL_DATASETS,\n",
+    "        bounding_box=BBOX,\n",
+    "        temporal=DATE_RANGE,\n",
+    "    ),\n",
+    "    target_itrf=ICESAT2_ITRF,\n",
+    "    output_dir=OUTPUT_DIR,\n",
+    "    overwrite=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e29f9cf9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Now, read the data stored in the parquet dataset using dask\n",
+    "\n",
+    "iceflow_df = dd.read_parquet(parquet_path)\n",
+    "\n",
+    "# Ensure that our index is set as a datetime object \n",
+    "iceflow_df = iceflow_df.reset_index()\n",
+    "iceflow_df[\"utc_datetime\"] = dd.to_datetime(iceflow_df[\"utc_datetime\"])\n",
+    "iceflow_df = iceflow_df.set_index(\"utc_datetime\")\n",
+    "\n",
+    "iceflow_df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "925eede5-e9be-4793-9b1d-9c5562c73b7c",
+   "metadata": {},
+   "source": [
+    "#  Search for and download ICESat2 data using `icepyx`\n",
+    "\n",
+    "Next, we will use icepyx to find [ATL06](https://nsidc.org/data/atl06/versions/6) data for the same area of interest and timeframe. \n",
+    "\n",
+    "To learn more about `icepyx.Query`, which is used below, see the [documentation](https://icepyx.readthedocs.io/en/latest/_icepyx/icepyx.Query.html#icepyx.Query)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bda00a2c-51dd-4d01-8ba4-4dcb0985931d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We will compare the ILATM1B data to ATL06 data from October 2018.\n",
+    "result = ipx.Query(\n",
+    "    \"ATL06\",\n",
+    "    [\n",
+    "        BBOX.lower_left_lon,\n",
+    "        BBOX.lower_left_lat,\n",
+    "        BBOX.upper_right_lon,\n",
+    "        BBOX.upper_right_lat,\n",
+    "    ],\n",
+    "    DATE_RANGE,\n",
+    ")\n",
+    "result"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0c146fa2-fdb9-4774-a6bf-481e2498948a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download the results. Note that this may take a while and download [TODO AMOUNT] to local disk.\n",
+    "result.download_granules(\"downloaded-data/ATL06/\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d12d21ed-0774-4f58-b75e-56f2734d27a6",
+   "metadata": {},
+   "source": [
+    "Next, we will use [icepyx.Read](https://icepyx.readthedocs.io/en/latest/_icepyx/icepyx.Read.html#icepyx.Read) to read the data into an [xarray Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html).\n",
+    "\n",
+    "Note that the code below wraps the reading of each file in a `try/except` block because of a known issue with subsetting. See https://github.com/icesat2py/icepyx/issues/576 for more information."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "081376b5-308b-43c3-bc6d-b9c4342586d0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "datasets = []\n",
+    "for file in Path(\"downloaded-data/ATL06/\").glob(\"*.h5\"):\n",
+    "    try:\n",
+    "        reader = ipx.Read(str(file))\n",
+    "        reader.vars.append(var_list=[\"h_li\", \"latitude\", \"longitude\"])\n",
+    "        ds = reader.load()\n",
+    "        datasets.append(ds)\n",
+    "    except:\n",
+    "        print(f\"{file=} contains an error and will not be read\")\n",
+    "        continue\n",
+    "\n",
+    "len(datasets)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dc7fc4b2-37ee-4f44-a21e-d6225eccc0f8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds = xr.concat(datasets, dim=\"gran_idx\")\n",
+    "ds"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31d5cf8f-750e-4e9d-9cda-431d0a1a6a61",
+   "metadata": {},
+   "source": [
+    "`icepyx` reads ICESat2 data as an xarray dataset. `xarray` is a powerful tool and the data is ready to use in this format, but to simplify things for this notebook and make the data more compatible with `iceflow`, the next step will convert the data into an `iceflow`-compatible [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "09a51188-be09-486d-be10-a7bede9d02ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Rename vars to be consistent with how\n",
+    "# `iceflow` identifies the elevation and time fields\n",
+    "ds = ds.rename({\"h_li\": \"elevation\", \"delta_time\": \"utc_datetime\"})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "422d0a32-b676-4fe2-89ba-14f31bba6e80",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Convert to an iceflow dataframe\n",
+    "spot_dfs = []\n",
+    "for spot in ds.spot.data.flatten():\n",
+    "    df = pd.DataFrame.from_dict(\n",
+    "        {\n",
+    "            \"elevation\": ds.sel(spot=spot).elevation.data.flatten(),\n",
+    "            \"latitude\": ds.sel(spot=spot).latitude.data.flatten(),\n",
+    "            \"longitude\": ds.sel(spot=spot).longitude.data.flatten(),\n",
+    "            \"utc_datetime\": ds.sel(spot=spot).utc_datetime.data.flatten(),\n",
+    "            \"spot\": [spot] * len(ds.sel(spot=spot).elevation.data.flatten()),\n",
+    "            \"ITRF\": ICESAT2_ITRF,\n",
+    "        },\n",
+    "    )\n",
+    "    spot_dfs.append(df)\n",
+    "\n",
+    "\n",
+    "df = pd.concat(spot_dfs, sort=True)\n",
+    "# Drop rows where lat, lon, or elev are missing.\n",
+    "df = df.dropna(subset=[\"latitude\", \"longitude\", \"elevation\"], how=\"any\")\n",
+    "df = df.set_index(\"utc_datetime\")\n",
+    "# Cast the df as an IceflowDataFrame. \n",
+    "# The `atl06_df` can now be used with e.g., `iceflow`'s ITRF conversion function \n",
+    "# to perform plate motion model adjustments if desired.\n",
+    "atl06_df = IceflowDataFrame(df)\n",
+    "atl06_df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2200a3ff-3cac-441d-a4ac-09ef25fd1c66",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The ATL06 data contains some negative elevation values\n",
+    "atl06_df.elevation.min()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "826ace9e-0638-4240-8769-5b0a1385b173",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Filter out negative values. We expect positive elevations.\n",
+    "atl06_df = atl06_df[atl06_df.elevation > 0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3c8966f-482c-4ecc-91a5-c6dce34f37de",
+   "metadata": {},
+   "source": [
+    "### Average elevation data over our area of interest at a weekly resolution\n",
+    "\n",
+    "In this next step, we will resample the ATL06 and `iceflow` data to a weekly resolution, taking the mean of elevation over our area of interest. This will provide us with one data point per week where data is available, giving us a general idea of how the elevation of our area of interest changes over time."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fb60aa6c-4b9d-4592-95be-4687636ccd1b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# resample the ATL06 data to a weekly mean.\n",
+    "atl06_avg_df = atl06_df[[\"elevation\"]].resample(\"W\").mean()\n",
+    "atl06_avg_df = atl06_avg_df.dropna(how=\"any\")\n",
+    "atl06_avg_df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c5a76ba1-d2b1-49fd-96f7-bf41264c26a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# resample the ATL06 data to a weekly mean.\n",
+    "iceflow_df_sampled = iceflow_df.repartition(freq=\"1W\")\n",
+    "iceflow_df_sampled = iceflow_df_sampled.dropna(how=\"any\")\n",
+    "\n",
+    "iceflow_df_sampled = iceflow_df_sampled[iceflow_df_sampled.elevation > 0]\n",
+    "\n",
+    "iceflow_avg = iceflow_df_sampled.resample(\"W\").agg({\n",
+    "    \"elevation\": \"mean\",\n",
+    "    \"dataset\": lambda x: \", \".join(x.astype(\"str\").unique()),   \n",
+    "})\n",
+    "iceflow_avg = iceflow_avg.replace([np.inf, -np.inf], np.nan) \n",
+    "iceflow_avg = iceflow_avg.dropna(how=\"any\")\n",
+    "iceflow_avg = iceflow_avg.compute()\n",
+    "\n",
+    "iceflow_avg"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a530b6e-2474-400e-8f04-a5665587591b",
+   "metadata": {},
+   "source": [
+    "### Plot the results as a timeseries\n",
+    "\n",
+    "Now we will use [matplotlib](https://matplotlib.org/) to plot the results as a timeseries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5e733fe6-cc28-4696-aadf-272fe755ad2c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "# We will use a unique marker for each dataset\n",
+    "iceflow_marker_map = {\n",
+    "    \"GLAH06v034\": \"s\",\n",
+    "    \"BLATM1Bv1\": \"D\",\n",
+    "    \"ILATM1Bv1\": \"x\",\n",
+    "    \"ILATM1Bv2\": \"o\",\n",
+    "    \"ILVIS2v1\": \"v\",\n",
+    "    \"ILVIS2v2\": \"^\",\n",
+    "}\n",
+    "\n",
+    "fig = plt.figure(figsize=(10, 8))\n",
+    "\n",
+    "for dataset, marker in iceflow_marker_map.items():\n",
+    "    subset = iceflow_avg[iceflow_avg.dataset == dataset]\n",
+    "    if subset.elevation.any():\n",
+    "        plt.scatter(subset.index, subset.elevation, marker=marker, label=dataset, linestyle=\"\", color=\"black\")\n",
+    "\n",
+    "plt.scatter(atl06_avg_df.index, atl06_avg_df.elevation, color=\"black\", marker=\"*\", label=\"ATL06v6\", linestyle=\"\")\n",
+    "\n",
+    "plt.xlabel(\"Date\")\n",
+    "plt.ylabel(\"Elevation\")\n",
+    "plt.legend(title=\"Dataset\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33313a43-93e9-4391-b5c8-db85008f2cae",
+   "metadata": {},
+   "source": [
+    "## Conclusions\n",
+    "\n",
+    "In this notebook, we found and analyzed laser altimetry data from a variety of datasets using `iceflow` and `icepyx`. \n",
+    "\n",
+    "In the timeseries plot above, we can see how the surface elevation of a small area near Sermeq Kujalleq (Jakobshavn Isbrae) changes over time.\n",
+    "\n",
+    "Note that this analysis was relatively simple. Although the data and plot above give us an idea of surface elevation changes, it should be noted that there are still a number of things a researcher should consider when doing an analysis across many datasets over time. To further this analysis, we may want to consider doing one or more of the following:\n",
+    "\n",
+    "* Outlier detection and filtering\n",
+    "* Cross-calibration of data between sensors/datasets\n",
+    "* Plate motion model coordinate propagation (see the [NSIDC Iceflow example Jupyter Notebook](https://iceflow.readthedocs.io/en/latest/iceflow-example.html) for an example of how to do this)\n",
+    "* Account for spatial variability within our region of interest"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}