-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy path07-SummaryTools.Rmd
354 lines (283 loc) · 12.5 KB
/
07-SummaryTools.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
---
title: "07-SummaryTools"
author: "Dimitrios Markou"
date: "`r Sys.Date()`"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Chapter 7: Summary Tools
##### Authors: Dimitrios Markou, Danielle Ethier
> In Chapter 6, you downloaded satellite imagery from the Copernicus SENTINEL-2 mission and calculated spectral indices (NDWI, NDVI) over [La Mauricie National Park](https://parks.canada.ca/pn-np/qc/mauricie/nature) to combine with NatureCounts data from the [Quebec Breeding Bird Atlas (2010 - 2014)](https://naturecounts.ca/nc/default/datasets.jsp?code=QCATLAS2PC&sec=bmdr). In this chapter, you will create data summaries and visualizations using NatureCounts data and environmental covariates.
**This chapter uses the data products prepared in Chapters 4-6 of the Spatial_Data_Tutorial, and the National Park boundary and NatureCounts data downloaded in section 4.1 Data Setup from [Chapter 4: Elevation Data](04-ElevationData.Rmd). For quick access, all data are available for download via the [Google Drive](https://drive.google.com/drive/folders/1gLUC6fROl4kNBvTGselhZif-arPexZbY?usp=sharing) data folder. If you wish to gain experience in how to download, process, and save the environmental layers yourself, return to the earlier chapters of this tutorial series and explore the Additional Resources articles.**
# 7.0 Learning Objectives {#7.0LearningObjectives}
By the end of **Chapter 7 - Summary Statistics**, users will know how to:
- Create and visualize NatureCounts and environmental data summaries using four key examples: Landscape Association Plot, Species Rank Plot, Elevation Plot, and NDVI Plot
Load the required packages:
```{r, eval = TRUE, warning = FALSE, message = FALSE}
library(tidyverse)
library(sf)
```
# 7.1 Data Setup {#7.4SummaryTools}
In [Chapter 4](04-ElevationData.Rmd), [Chapter 5](05-LandcoverData.Rmd), and [Chapter 6](06-SatelliteImagery.Rmd) you extracted elevation, land cover, and NDVI values, respectively over bird observation sites across La Mauricie National Park. These data were uploaded to the [Google Drive data folder](https://drive.google.com/drive/folders/1gLUC6fROl4kNBvTGselhZif-arPexZbY?usp=sharing) for your convenience.
Let's download all the environmental covariates into R and join them to a common dataframe.
```{r, message = FALSE}
# List the dataframes
env_covariates <- list.files(path = "data/mauricie/env_covariates",
pattern = "\\.csv$",
full.names = TRUE)
# Read each CSV into a list of dataframes
env_covariates_list <- lapply(env_covariates, read_csv)
# Combine NatureCounts and environmental covariates
env_covariates_df <- Reduce(function(x, y) left_join(x, y, by = "record_id"), env_covariates_list)
```
Read in the NatureCounts data you saved or downloaded from [Chapter 4: Elevation Data](04-ElevationData.Rmd).
```{r, warning = FALSE, message = FALSE}
mauricie_birds_df <- read_csv("data/mauricie/mauricie_birds_df.csv")
```
Create an `sf` object from the NatureCounts data that represents the unique point count locations.
```{r}
mauricie_birds_sf <- mauricie_birds_df %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326)
```
Trim the `sf` object down by selecting key attribute columns.
```{r}
mauricie_birds_sf <- mauricie_birds_sf %>%
select(record_id, species_id, SiteCode, Locality, SamplingEventIdentifier, RouteIdentifier, survey_year, survey_month, survey_day, english_name, ObservationCount)
```
Combine the environmental covariates with the NatureCounts data.
```{r}
mauricie_data <- mauricie_birds_sf %>%
merge(env_covariates_df, by = "record_id")
```
Assign a point identifier to each location based on its unique geometry.
```{r}
mauricie_data_summary <- mauricie_data %>%
group_by(SiteCode, geometry) %>%
mutate(point_id = cur_group_id()) %>%
ungroup() %>%
distinct()
```
Convert the `sf` object back to a regular dataframe.
```{r}
mauricie_data_summary <- st_drop_geometry(mauricie_data_summary)
```
Calculate the species richness and abundance at each point.
```{r}
biodiversity_count <- mauricie_data_summary %>%
group_by(point_id) %>%
summarise(n_species = n_distinct(english_name),
n_individuals = sum(ObservationCount, na.rm = TRUE), .groups = "drop") # Count unique species
```
Drop all columns except `point_id`, the environmental variables, and `geometry` and group rows together by location.
```{r}
enviro_data_df <- mauricie_data_summary %>%
select(-record_id, -species_id, -english_name, -ObservationCount, -SiteCode,
-Locality, -SamplingEventIdentifier, -RouteIdentifier, -survey_year, -survey_month, -survey_day) %>%
group_by(point_id) %>%
summarise(across(everything(), ~ {
x <- na.omit(.)
if (length(x) == 0) NA else paste(unique(x), collapse = ", ")
}, .names = "{.col}"), .groups = "drop") %>%
ungroup()
```
Join the `biodiversity_count` dataframe.
```{r}
enviro_data_df <- enviro_data_df %>%
left_join(biodiversity_count, by = "point_id")
```
We can summarize the combined NatureCounts and environmental data to explore possible patterns in species abundance or richness.
# 7.3 Species Rank
Species rank plots show the relative abundance (number of individuals) of a species in a community. The number of individuals of each species are sorted in ascending or descending order.
Group the NatureCounts data by species and rank them in order of abundance.
```{r}
species_rank <- mauricie_data %>%
filter(!is.na(english_name)) %>% # Remove rows with NA in english_name
group_by(english_name) %>% # Group by species only
summarize(total_abundance = sum(ObservationCount, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_abundance)) %>% # Sort in descending order of abundance
slice_max(total_abundance, n = 40) %>% # Keep only the top 40 species
mutate(rank = row_number()) # Assign rank to each species
```
Plot the abundance of each species and its rank across the entire park.
```{r}
ggplot(species_rank, aes(x = reorder(english_name, rank), y = total_abundance)) +
geom_line(group = 1, size = 1, color = "black") +
geom_point(size = 2, color = "darkgreen") +
theme_minimal() +
labs(
title = "Abundance of High Rank Species across La Mauricie Park",
x = "Species",
y = "Total Abundance"
) +
theme(
axis.text.x = element_text(size = 8, angle = 60, hjust = 1)
)
```
Represent Species Rank as a horizontal bar plot.
```{r}
ggplot(species_rank, aes(y = reorder(english_name, rank), x = total_abundance)) +
geom_col(fill = "steelblue") +
theme_minimal() +
labs(
title = "Abundance of High Rank Species across La Mauricie Park",
y = "Species",
x = "Total Abundance"
) +
theme(axis.text.y = element_text(size = 7))
```
OPTION 2
Group the NatureCounts data by species and point_id and rank them in order of abundance.
```{r}
species_rank_pointID <- mauricie_data %>%
filter(!is.na(english_name)) %>% # Remove rows with NA in english_name
group_by(point_id, english_name) %>%
summarize(total_abundance = sum(ObservationCount, na.rm = TRUE), .groups = "drop") %>%
arrange(point_id, desc(total_abundance)) %>%
group_by(point_id) %>%
mutate(rank = row_number())
```
Plot the abundance of each species and its rank for each point_id, respectively.
```{r}
ggplot(species_rank_pointID, aes(x = as.numeric(rank), y = total_abundance, color = factor(point_id))) +
geom_line(size = 1) +
geom_point(size = 2) +
theme_minimal() +
labs(
title = "Abundance of High Rank Species across La Mauricie Park",
x = "Species Rank",
y = "Abundance",
color = "Point ID"
) +
scale_color_manual(values = RColorBrewer::brewer.pal(n = length(unique(species_rank_pointID$point_id)), "Set3"))
```
# 7.4 Elevation
NOTE: This result assumes that sampling is random with regards to elevation, which is unlikely.
```{r}
# Create elevation classes with labels
mauricie_data <- mauricie_data %>%
mutate(
elevation_class = case_when(
elevation < 200 ~ "Low",
elevation >= 200 & elevation < 400 ~ "Mid",
elevation >= 400 ~ "High"
),
elevation_class = factor(elevation_class, levels = c("Low", "Mid", "High")) # Set the factor levels
)
# Calculate species richness for each elevation class
elevation_summary <- mauricie_data %>%
group_by(elevation_class) %>%
summarize(n_species = n_distinct(english_name), .groups = "drop")
# Plot species richness per elevation class
ggplot(elevation_summary, aes(x = elevation_class, y = n_species)) +
geom_bar(stat = "identity", color = "black", fill = "steelblue") +
geom_text(
aes(label = paste0("n = ", n_species)),
vjust = -0.5, # Position the text above the bars
size = 3.5 # Adjust text size
) +
theme_minimal() +
labs(
title = "Species Richness by Elevation Class",
x = "Elevation Class",
y = "Species Richness"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate labels if needed
```
# 7.5 Landscape
Plot the species richness and mean landscape metrics.
```{r}
# Summarize species richness, mean ed, and mean np by point_id
landscape_summary <- mauricie_data %>%
group_by(point_id) %>%
summarize(
n_species = n_distinct(english_name), # Species richness
mean_ed = mean(ed, na.rm = TRUE), # Mean edge density
mean_np = mean(np, na.rm = TRUE), # Mean number of patches
.groups = "drop"
)
# Scatterplot: Species richness vs. Mean edge density
ggplot(landscape_summary, aes(x = mean_ed, y = n_species)) +
geom_point(color = "blue", size = 3, alpha = 0.7) +
labs(
title = "Species Richness vs. Edge Density",
x = "Mean Edge Density",
y = "Species Richness"
) +
theme_minimal()
# Scatterplot: Species richness vs. Mean number of patches
ggplot(landscape_summary, aes(x = mean_np, y = n_species)) +
geom_point(color = "green", size = 3, alpha = 0.7) +
labs(
title = "Species Richness vs. Number of Patches",
x = "Mean Number of Patches",
y = "Species Richness"
) +
theme_minimal()
```
# 7.6 NDVI Plots
Plot the relative species abundance for different NDVI ranges.
```{r}
# Create NDVI classes with numeric labels
mauricie_data <- mauricie_data %>%
mutate(
ndvi_range = case_when(
ndvi > 0 & ndvi <= 0.2 ~ "0 to 0.2",
ndvi > 0.2 & ndvi <= 0.4 ~ "0.2 to 0.4",
ndvi > 0.4 & ndvi <= 0.6 ~ "0.4 to 0.6",
ndvi > 0.6 & ndvi <= 0.8 ~ "0.6 to 0.8",
)
)
# Calculate species abundance for each NDVI class
ndvi_summary <- mauricie_data %>%
group_by(ndvi_range) %>%
summarize(
total_individuals = sum(ObservationCount, na.rm = TRUE),
total_species = n_distinct(scientific_name)
) %>%
mutate(
relative_abundance = total_individuals / sum(total_individuals)
)
# Plot relative species abundance per NDVI class
ggplot(ndvi_summary, aes(x = ndvi_range, y = relative_abundance, fill = total_individuals)) +
geom_bar(stat = "identity", color = "black") +
geom_text(
aes(label = paste0("n = ", total_individuals)),
vjust = -0.5, # Position the text above the bars
size = 3.5 # Adjust text size
) +
scale_fill_gradient(
low = "lightgreen", high = "darkgreen", # Customize colors for the gradient
name = "Abundance" # Legend title
) +
theme_minimal() +
labs(
title = "Relative Species Abundance According to NDVI Distribution",
x = "NDVI",
y = "Relative Abundance"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
```
Plot species richness and minimum NDVI grouped by SiteCode.
```{r}
# Calculate species richness by SiteCode
species_richness <- mauricie_data %>%
group_by(SiteCode) %>% ###Prehaps we can used distinct lat and long??###
summarise(species_richness = n_distinct(scientific_name))
# Calculate minimum NDVI by SiteCode
min_ndvi <- mauricie_data %>%
group_by(SiteCode) %>%
summarise(min_ndvi = min(ndvi, na.rm = TRUE))
# Merge species richness and mean NDVI by SiteCode
species_ndvi_data <- left_join(species_richness, min_ndvi, by = "SiteCode")
# Plot species richness and minimum NDVI
ggplot(species_ndvi_data, aes(x = min_ndvi, y = species_richness)) +
geom_point() +
labs(title = "Species Richness and Minimum NDVI at Each Observation Site",
x = "Min NDVI",
y = "Bird Species Richness") +
theme_minimal()
```
------------------------------------------------------------------------
Congratulations! You completed Chapter 7 - Raster Summary Tools. In this chapter, you successfully transformed and combined raster data, performed raster algebra operations, and created summary plot for NatureCounts data using environmental covariates.