-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy path03-ClimateData.Rmd
328 lines (245 loc) · 14.8 KB
/
03-ClimateData.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
---
title: "03-ClimateData"
author: "Dimitrios Markou"
date: "2024-07-23"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Chapter 3: Climate Data
##### Authors: Dimitrios Markou, Danielle Ethier
| In [Chapter 2](02-SpatialSubsets.Rmd), you applied geoprocessing functions to spatial (vector) objects, and filtered and visualized NatureCounts data within KBAs and Priority Places. Now you are ready to explore your NatureCounts data in relation to environmental covariates. In this tutorial, you will use NatureCounts and climate data to explore possible patterns in Whooping Crane observations, annual mean temperature, and annual mean precipitation within the Wood Buffalo National Park, the province of Alberta, and beyond.
# 3.0 Learning Objectives
By the end of **Chapter 3 - Climate Data**, users will know how to:
- Download and preprocess vector and raster climate data
- Combine NatureCounts observations with climate data
- Visualize NatureCounts and climate data using plots and spatio-temporal maps
This tutorial utilizes the following bird occurrence, spatial, and climate data sources:
| Data | Description |
|------------------------------------|------------------------------------|
| [eBird Canada (Prairies)](https://naturecounts.ca/nc/default/datasets.jsp?code=EBIRD-CA-PR) | NatureCounts, EBIRD-CA-PR (1800-2024) |
| Alberta Breeding Bird Atlas | NatureCounts, ABATLAS1 ([1987-1992](https://naturecounts.ca/nc/default/datasets.jsp?code=ABATLAS1)) and ABATLAS2 ([2000-2005](https://naturecounts.ca/nc/default/datasets.jsp?code=ABATLAS2)) |
| [Alberta Bird Records](https://naturecounts.ca/nc/default/datasets.jsp?code=ABBIRDRECS) | NatureCounts, ABIRDRECS (1941-2006) |
| [Whooping Crane Nesting Area and Summer Range](https://kbacanada.org/site/?SiteCode=NT002) | Key Biodiversity Area boundary (**.shp**) and site attributes |
| [Environment and Climate Change Canada](https://climate.weather.gc.ca/historical_data/search_historic_data_e.html) | Historical (vector) weather data, accessed through the `weathercan` R package |
| [WorldClim](https://www.worldclim.org/data/bioclim.html) | Historical (raster) climate data. Includes nineteen bioclimatic variables representing the 1970-2000 average |
Load the necessary packages:
```{r package library, warning = FALSE, message = FALSE}
library(tidyverse)
library(naturecounts)
library(sf)
library(lubridate)
library(raster)
library(terra)
library(leaflet)
# weathercan is in the R-Universe rather than on CRAN, so we install the package a little differently
# install.packages("weathercan",
# repos = c("https://ropensci.r-universe.dev",
# "https://cloud.r-project.org"))
library(weathercan)
```
# 3.1 Data Setup
```{r}
collections <- meta_collections()
View(meta_collections())
```
```{r}
search_species("whooping crane")
View(search_species("whooping crane"))
```
To download the NatureCounts data, you can specify the collection and species code relevant to your research. Replace `testuser` with your user name.
> The data download will not work unless you replace `"testuser"` with your actual user name. You will be prompted to enter your password.
```{r}
whooping_crane_data <- nc_data_dl(collections = c("ABATLAS1", "ABATLAS2", "ABBIRDRECS"), species = 4030, username = "testuser", info = "spatial_data_tutorial")
```
eBird has the greatest number of Whooping Crane records, however, this collection comprise data of Access Level 4. If you wish to access this collection you must sign up for a free account and make a [data request](https://naturecounts.ca/nc/default/explore.jsp#table). Otherwise, you can carry forward with the tutorial without these data and skip this code chunk.
```{r}
whooping_crane_data <- nc_data_dl(collections = c("ABATLAS1", "ABATLAS2", "ABBIRDRECS","EBIRD-CA-PR"), species = 4030, username = "testuser", info = "spatial_data_tutorial")
```
Grouping the NatureCounts data might provide more meaningful insight on species occurrences at each site and how they might vary across time and space. You can achieve this by summarizing species occurrence by SiteCode and survey year, while keeping the coordinate information for each site using the `group_by` and `summarise` functions:
```{r}
species_occurrence_summary <- whooping_crane_data %>%
formate_dates() %>%
group_by(SiteCode, survey_year, longitude, latitude) %>%
summarise(total_count = sum(as.numeric(ObservationCount), na.rm = TRUE)) %>%
ungroup()
```
Lastly, you can summarize the NatureCounts data once more to get the annual total counts of Whooping Cranes, say between 1985 and 2011. To do so, we can 1) produce a regular dataframe called **cranes_summary** by dropping the geometry column (`st_drop_geometry`) 2) calculate the **annual_count** by using the `summarise` and `sum` functions and 3) filter for NA values and by **year:**
```{r}
cranes_summary <- species_occurrence_summary %>%
st_drop_geometry() %>%
group_by(survey_year) %>%
summarize(annual_count = sum(total_count, na.rm = TRUE)) %>%
filter(!is.na(survey_year)) %>%
filter(survey_year >= 1985 & survey_year <= 2011)
cranes_summary
```
# 3.2 Weathercan (Vector) Data
`weathercan` is an R package designed to help access historical weather data from ECCC. Steffi LaZerte's online tutorial, [Integrating data from weathercan](https://ropensci.org/blog/2018/03/06/weathercan/), provides more details on how to download, filter, and visualize weather data in R.
First, let's take a look at the built in **stations** dataset:
```{r}
stations <- weathercan::stations()
head(stations)
```
You are ready to download weather data! For the purpose of this tutorial, let's perform a smaller weather data download by specifying specific station IDs. The weather stations closest to Wood Buffalo National Park are BIRCH MOUNTAIN LO (station_id = 2481) and BUCKTON LO (station_id = 2486). These stations were identified using the the ECCC [search option](https://climate.weather.gc.ca/historical_data/search_historic_data_stations_e.html?searchType=stnProx&timeframe=1&txtRadius=25&selCity=&optProxType=park&selPark=57%7C39%7C112%7C0%7CWood+Buffalo+National+Park&txtCentralLatDeg=&txtCentralLatMin=&txtCentralLatSec=&txtCentralLongDeg=&txtCentralLongMin=&txtCentralLongSec=&txtLatDecDeg=&txtLongDecDeg=&optLimit=yearRange&StartYear=1840&EndYear=2024&Year=2024&Month=10&Day=1&selRowPerPage=25) which helps you search for historical weather data based on proximity to a city, coordinate, or National Park.
```{r message = FALSE}
birch_station <- stations_search(name = "BIRCH MOUNTAIN LO", interval = "day")
buckton_station <- stations_search(name = "BUCKTON LO", interval = "day")
wood_buff_stations <- rbind(birch_station, buckton_station) # combine
```
Now you can perform the weather data download (give it a few minutes).
```{r message = FALSE}
weather <- weather_dl(station_ids = wood_buff_stations$station_id,
start = "1985-01-01",
end = "2011-12-31",
interval = "day", quiet = TRUE)
```
NOTE - Larger data downloads can be performed based on filtered stations datasets as well as start, end, and interval specifications. Large historical weather downloads can take a while!
Here is an example where we filtered based on province, interval, station operation period, and elevation to exclude extreme measurements in the weather data download:
> To execute this code chunk, remove the \#
```{r}
#prov_stations <- stations %>%
# filter(prov %in% c("AB"),
# interval == "day",
# start >= 1960,
# end <= 2023,
# elev < 1000)
```
> To execute this code chunk, remove the \#
```{r, eval = FALSE}
# weather <- weather_dl(station_ids = prov_stations$station_id,
# start = "2000-01-01",
# end = "2023-12-31",
# interval = "day", quiet = TRUE)
```
Next, convert the date column to Date data type. This will allow you to apply more filters based on month and survey year:
```{r}
weather$date <- as.Date(weather$date)
```
Finally, to summarize the weather data by calculating the annual mean summer temperature and precipitation we can use the familiar `filter`, `group_by`, and `summarise` functions:
```{r}
yearly_climate <- weather %>%
filter(month(date) %in% 5:8) %>% # Filter for summer months May (5) through August (8)
group_by(survey_year = year(date)) %>% # Group by year
summarize(
yearly_avg_temp = mean(mean_temp, na.rm = TRUE),
yearly_avg_precip = mean(total_precip, na.rm = TRUE)
)
yearly_climate
```
Now that you have prepared the NatureCounts and weathercan climate data you can combine the data together for analysis.
```{r}
cranes_climate_summary <- left_join(cranes_summary, yearly_climate, by="survey_year")
cranes_climate_summary
```
To visualize average annual summer temperature and total count, you can use a scatterplot.
```{r}
ggplot(cranes_climate_summary, aes(x = yearly_avg_temp, y = annual_count)) +
geom_point(color = "blue", size = 3, alpha = 0.7) + # Simple blue points
geom_smooth(method = "lm", se = FALSE, color = "red") + # Add linear regression line
theme_minimal() +
labs(x = "Avg Temperature (°C)",
y = "Total Count",
title = "Whooping Crane Count vs. Annual Avg Summer Temp") +
theme(plot.title = element_text(hjust = 0.5)) # Center the title
```
And do the same for annual summer precipitation:
```{r}
ggplot(cranes_climate_summary, aes(x = yearly_avg_precip, y = annual_count)) +
geom_point(color = "blue", size = 3, alpha = 0.7) + # Simple blue points
geom_smooth(method = "lm", se = FALSE, color = "red") + # Add linear regression line
theme_minimal() +
labs(x = "Avg Precipitation",
y = "Total Count",
title = "Whooping Crane Count vs. Annual Avg Summer Precip") +
theme(plot.title = element_text(hjust = 0.5)) # Center the title
```
# 3.3 WorldClim (Raster) Data
[WorldClim](https://www.worldclim.org/data/index.html) provides high spatial resolution global weather and climate data. Their data is in raster GeoTiff (.tif) file format and comprises a variety of variables including 19 [bioclimatic](https://www.worldclim.org/data/bioclim.html) variables representing annual trends, seasonality, and extreme environmental factors.
Download the bioclimatic data on [this webpage](https://www.worldclim.org/data/worldclim21.html). The data are available at the four spatial resolutions, between 30 seconds (\~1 km2) to 10 minutes (\~340 km2). Each download is a "zip" file containing 19 GeoTiff (.tif) files, one for each month of the variables. For this tutorial, we recommend you download the 10 minute resolution.
After dowloading the WorldClim data to your working directory, list all the files in the raster stack by specifying the path to your folder (use `list.files` function) and read them into R using the `rast` function.
This code will get the path to your working directory.
```{r}
getwd()
```
Now point the `list.file` to this directory using the sample code below.
```{r}
worldclim_list <- list.files(path = "C:/YOUR/PATH/HERE", pattern = "\\.tif$", full.names = TRUE)
# Read and stack the raster files
worldclim_stack <- rast(worldclim_list)
# Print information about the stack
print(worldclim_stack)
```
Great! The raster stack which includes the 19 bio-climatic variables has been read into R successfully. We can simplify the layer names which represent each variable, respectively:
```{r}
names(worldclim_stack) <- paste0("bio", 1:19)
```
Let's take a look at the first layer 'bio1' which is Annual Mean Temperature. To do so, you can subset the layers of the SpatRaster with \$ or two sets of square brackets [ [ ] ].
```{r}
# Plot the first bio-climatic layer (e.g., bio1: Annual Mean Temperature)
plot(worldclim_stack[["bio1"]], main = "Annual Mean Temperature (bio1)")
# Summary statistics for each layer
summary(worldclim_stack)
```
Convert the **species_occurences_summary**, which summarizes species occurrence by SiteCode, into a spatial object using the `st_as_sf` function:
```{r}
cranes_sf <- sf::st_as_sf(species_occurrence_summary,
coords = c("longitude", "latitude"), crs = 4326)
```
Extract the bio-climatic variable values for each bird observation site, respectively.
```{r}
bioclim_values <- extract(worldclim_stack, cranes_sf)
```
Then combine the bio-climatic values with the NatureCounts data.
```{r}
bioclim_data <- cbind(cranes_sf, bioclim_values, longitude = species_occurrence_summary$longitude, latitude = species_occurrence_summary$latitude)
```
Filter the data to only include observations made post 1970 to match the temporal resolution of the climate dataset:
```{r}
bioclim_data <- bioclim_data %>%
mutate(date = as.Date(date)) %>%
filter(year(date) >= "1970")
summary(bioclim_data)
```
Visualize the distribution of **bio1** compared to **total_count**:
```{r}
# Scatter plot of bird observations against a bioclimatic variable (bio1 = Annual Mean Temperature)
ggplot(data = bioclim_data, aes(x = bio1, y = total_count)) +
geom_point(alpha = 0.6) +
labs(title = "Bird Observations vs. Bioclimatic Variable (bio1)",
x = "Annual Mean Temperature",
y = "ObservationCount") +
theme_minimal()
```
Map the NatureCounts data by grouping it by **SiteCode**, sizing the points by **total_count** and colorizing them by a bio-climatic variable (**bio1**):
```{r}
bioclim_data <- st_transform(bioclim_data, crs = 4326)
```
Now let's make an interactive maps of total counts per site.
```{r}
# Group by SiteCode and summarize total_count
bioclim_data <- bioclim_data %>%
group_by(SiteCode) %>%
summarize(total_count = sum(total_count, na.rm = TRUE),
bio1 = mean(bio1, na.rm = TRUE),
latitude = first(latitude),
longitude = first(longitude))
# Define color palette for bio1
pal <- colorNumeric(palette = "YlOrRd", domain = bioclim_data$bio1)
# Define a scaling factor for total_count
scaling_factor <- 2 # Adjust this to scale the circle size
# Plot using leaflet
leaflet(bioclim_data) %>%
addTiles() %>%
addCircleMarkers(
~longitude, ~latitude,
radius = ~pmin(pmax(sqrt(total_count) * scaling_factor, 3), 15), # Scale and limit radius
fillColor = ~pal(bio1), # Color points by bio1
color = "black", stroke = TRUE, fillOpacity = 0.8,
popup = ~paste0("<strong>SiteCode:</strong> ", SiteCode, "<br>", "<strong>Total Count:</strong> ", total_count, "<br>", "<strong>Bio1:</strong> ", bio1) # creates info popup labels
) %>%
addLegend(pal = pal, values = ~bio1, title = "Bio1", position = "bottomright") %>%
addScaleBar(position = "bottomleft", options = scaleBarOptions(imperial = FALSE))
```
**Congratulations**! You've completed **Chapter 3 - Climate Data**. Here, you successfully combined NatureCounts data with vector and raster climate data in R. In [Chapter 4](04-ElevationData.Rmd) you can explore how-to link NatureCounts observations with elevation data.