-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy path01-SpatialDataExploration.Rmd
138 lines (95 loc) · 7.04 KB
/
01-SpatialDataExploration.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
title: "01-SpatialDataExploration"
author: "Dimitrios Markou"
date: "`r Sys.Date()`"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Chapter 1: Spatial Data Exploration
##### Authors: Dimitrios Markou, Danielle Ethier
| Exploring your data in a visual way is an important first step to any analysis. This chapter will give you a foundational understanding of how to work with your NatureCounts data spatially in R. This Chapter assumes that you have a basic understanding of how to access your data from NatureCounts. The [NatureCounts Introductory R Tutorial](https://github.com/BirdsCanada/NatureCounts_IntroTutorial.git) is where you should start if you're new to the `naturecounts` R package. It explains how to access, view, filter, manipulate, and visualize NatureCounts data. We recommend reviewing this tutorial before proceeding.
# 1.0 Learning Objectives
By the end of **Chapter 1 - Spatial Data Exploration**, users will know how to:
- Distinguish between types of spatial data (vector vs raster)
- Select from a variety of geoprocessing functions in the `sf` package
- Visualize NatureCounts data using spatio-temporal maps
This R tutorial requires the following **packages**:
```{r, message = FALSE}
library(naturecounts)
library(sf)
library(tidyverse)
library(ggspatial)
```
# 1.1 Spatial Data Types
Spatial data is any type of vector or raster data that represents a feature or phenomena across geographic space.
| **Vector data** is used to represent features with points, lines and polygons. This may include individual bird observations, rivers, or conservation area boundaries.
| **Raster data** is used to represent spatially continuous data with a grid, where each cell has one value. This may include types of environmental data like temperature, elevation or land use.
The most common format used to store vector data in a file on disk is the **ESRI Shapefile** format *(.shp)*. Shapefiles are always accompanied by files with *.dbf*, *.shx,* and *.prj* extensions.
Raster data files are typically stored with TIFF or GeoTIFF files with a *(.tif)* or *(.tiff)* extension. Raster data manipulation will be in covered in subsequent chapters (see [Chapter 3: Climate Data](03-ClimateData.Rmd), [Chapter 4: Elevation Data](), [Chapter 5: Landcover Data](05-LandcoverData.Rmd), [Chapter 6: Satellite Imagery](06-SatelliteImagery.Rmd), and [Chapter 7: Raster Summary](07-RasterSummaryStats)).
Vector and raster data may also be associated with **attribute data** or **temporal data**. Attribute data provides additional information on the characteristics of spatial features while temporal data assigns a specific date or time range.
The `sf` package provides [simple feature](https://r-spatial.github.io/sf/) access in R. This package works best with spatial data (point, line, polygon, multipolygon) associated with tabular attributes (e.g shapefiles). You may be familiar with the `sp` package that has similar functionality in a different format, however, this package is no longer in use as of 2023 and does not support integration with `tidyverse` which is very popular among data scientists who use R.
# 1.2 Geo-processing Functions
Geoprocessing functions allow us to manipulate or compute spatial objects based on interactions between their geometries. There are several useful functions integrated into the `sf` package including:
| `st_transform()` - transforms the CRS of a specified CRS object
| `st_drop_geometry()` - removes the geometry column of a sf object
| `st_intersection(x, y)` - creates geometry of the shared portion of x and y
| `st_crop(x, y, ..., xmin, ymin, xmax, ymax)` - creates geometry of x that intersects a specified range
| `st_difference(x, y)` - creates geometry from x that does not intersect with y
| `st_area`, `st_length`, and `st_distance` can also be used to compute geometric measurements
More resources, including an `sf` package **cheatsheat** can be found [here](https://github.com/r-spatial/sf).
# 1.3 Spatio-temporal Mapping
#### *Example 1* - You would like to visualize the spatio-temporal distribution of Cedar Waxwing observations in June of each survey year using data from the Maritimes Breeding Bird Atlas (2006-2010).
Let's fetch the NatureCounts data.
First, we look to find the `collection` code for the Maritimes Breeding Bird Atlas.
```{r}
collections <- meta_collections()
View(meta_collections())
```
Second, we look to find the numeric species id.
```{r}
search_species("cedar waxwing")
```
Now we can download the data.
> The data download will not work unless you replace `"testuser"` with your actual user name. You will be prompted to enter your password.
```{r}
cedar_waxwing <- nc_data_dl(collections = "MBBA2PC", species = 16330, username = "testuser", info = "spatial_data_tutorial")
```
Use the [format_dates](https://rdrr.io/github/BirdStudiesCanada/naturecounts/man/format_dates.html) function to create date and day-of-year (doy) columns.
```{r}
cedar_waxwing <- format_dates(cedar_waxwing)
```
Filter the data to only include observations from the month of June.
```{r}
cedar_waxwing_june <- cedar_waxwing %>%
filter(survey_month == 6)
```
Convert the NatureCounts data to a spatial object using the point count coordinates.
```{r}
cedar_waxwing_june_sf <- sf::st_as_sf(cedar_waxwing_june,
coords = c("longitude", "latitude"), crs = 4326)
```
Finally, use `ggplot2` to visualize the spatio-temporal distribution of Cedar Waxwing observations across the Maritime provinces by color-coding the data points by **survey_year** and creating a multi-panel plot based on this discrete variable:
```{r warning=FALSE, error=FALSE}
ggplot(data = cedar_waxwing_june_sf) +
# Select a basemap
annotation_map_tile(type = "cartolight", zoom = NULL) +
# Plot the points, color-coded by survey_year
geom_sf(aes(color = as.factor(survey_year)), size = 1) +
# Facet by survey_year to create the multi-paneled map
facet_wrap(~ survey_year) +
# Customize the color scale
scale_color_brewer(palette = "Set1", name = "Survey Year") +
# Add a theme with a minimal design and change the font styles, to your preference
theme_minimal() +
theme(legend.position = "bottom") +
# To make the points in the legend larger without affecting map points
guides(color = guide_legend(override.aes = list(size = 3))) +
# Define the title and axis names
labs(title = "Cedar Waxwing June Observations by Survey Year",
x = "Longitude",
y = "Latitude")
```
The map above provides a simple visualization of NatureCounts data over a broad spatial and temporal scale.
**Congratulations!** You completed **Chapter 1: Spatial Data Exploration**. Here, you successfully visualized NatureCounts vector data over a wide spatial and temporal scale using a multi-panel plot. In [Chapter 2](02-SpatialFiltering.Rmd), you can explore spatial data manipulation, apply geoprocessing functions, and visualize NatureCounts data within [Key Biodiversity Areas (KBAs)](https://kbacanada.org/about/) and [Priority Places](https://open.canada.ca/data/en/dataset/91219d24-e877-4c8a-8bd2-b2b662e573e0).