07-SummaryTools.Rmd

---
title: "07-SummaryTools"
author: "Dimitrios Markou"
date: "`r Sys.Date()`"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Chapter 7: Summary Tools

##### Authors: Dimitrios Markou, Danielle Ethier

> In Chapter 6, you downloaded satellite imagery from the Copernicus SENTINEL-2 mission and calculated spectral indices (NDWI, NDVI) over [La Mauricie National Park](https://parks.canada.ca/pn-np/qc/mauricie/nature) to combine with NatureCounts data from the [Quebec Breeding Bird Atlas (2010 - 2014)](https://naturecounts.ca/nc/default/datasets.jsp?code=QCATLAS2PC&sec=bmdr). In this chapter, you will create data summaries and visualizations using NatureCounts data and environmental covariates.

**This chapter uses the data products prepared in Chapters 4-6 of the Spatial_Data_Tutorial, and the National Park boundary and NatureCounts data downloaded in section 4.1 Data Setup from [Chapter 4: Elevation Data](04-ElevationData.Rmd). For quick access, all data are available for download via the [Google Drive](https://drive.google.com/drive/folders/1gLUC6fROl4kNBvTGselhZif-arPexZbY?usp=sharing) data folder. If you wish to gain experience in how to download, process, and save the environmental layers yourself, return to the earlier chapters of this tutorial series and explore the Additional Resources articles.**

# 7.0 Learning Objectives {#7.0LearningObjectives}

By the end of **Chapter 7 - Summary Statistics**, users will know how to:

-   Create and visualize NatureCounts and environmental data summaries using four key examples: Landscape Association Plot, Species Rank Plot, Elevation Plot, and NDVI Plot

Load the required packages:

```{r, eval = TRUE, warning = FALSE, message = FALSE}
library(tidyverse)
library(sf)
```

# 7.1 Data Setup {#7.4SummaryTools}

In [Chapter 4](04-ElevationData.Rmd), [Chapter 5](05-LandcoverData.Rmd), and [Chapter 6](06-SatelliteImagery.Rmd) you extracted elevation, land cover, and NDVI values, respectively over bird observation sites across La Mauricie National Park. These data were uploaded to the [Google Drive data folder](https://drive.google.com/drive/folders/1gLUC6fROl4kNBvTGselhZif-arPexZbY?usp=sharing) for your convenience.

Let's download all the environmental covariates into R and join them to a common dataframe.

```{r, message = FALSE}
# List the dataframes
env_covariates <- list.files(path = "data/mauricie/env_covariates", 
                             pattern = "\\.csv$", 
                             full.names = TRUE)

# Read each CSV into a list of dataframes
env_covariates_list <- lapply(env_covariates, read_csv)

# Combine NatureCounts and environmental covariates 
env_covariates_df <- Reduce(function(x, y) left_join(x, y, by = "record_id"), env_covariates_list)
```

Read in the NatureCounts data you saved or downloaded from [Chapter 4: Elevation Data](04-ElevationData.Rmd).

```{r, warning = FALSE, message = FALSE}
mauricie_birds_df <- read_csv("data/mauricie/mauricie_birds_df.csv") 
```

Create an `sf` object from the NatureCounts data that represents the unique point count locations.

```{r}
mauricie_birds_sf <- mauricie_birds_df %>% 
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326) 
```

Trim the `sf` object down by selecting key attribute columns.

```{r}
mauricie_birds_sf <- mauricie_birds_sf %>%
  select(record_id, species_id, SiteCode, Locality, SamplingEventIdentifier, RouteIdentifier, survey_year, survey_month, survey_day, english_name, ObservationCount)
```

Combine the environmental covariates with the NatureCounts data.

```{r}
mauricie_data <- mauricie_birds_sf %>%
  merge(env_covariates_df, by = "record_id")
```

Assign a point identifier to each location based on its unique geometry.

```{r}
mauricie_data_summary <- mauricie_data %>%
  group_by(SiteCode, geometry) %>%
  mutate(point_id = cur_group_id()) %>%
  ungroup() %>%
  distinct()
```

Convert the `sf` object back to a regular dataframe. 

```{r}
mauricie_data_summary <- st_drop_geometry(mauricie_data_summary)
```

Calculate the species richness and abundance at each point.

```{r}
biodiversity_count <- mauricie_data_summary  %>%
  group_by(point_id) %>%
  summarise(n_species = n_distinct(english_name),
            n_individuals = sum(ObservationCount, na.rm = TRUE), .groups = "drop") # Count unique species
```

Drop all columns except `point_id`, the environmental variables, and `geometry` and group rows together by location. 

```{r}
enviro_data_df <- mauricie_data_summary %>%
  select(-record_id, -species_id, -english_name, -ObservationCount, -SiteCode, 
         -Locality, -SamplingEventIdentifier, -RouteIdentifier, -survey_year, -survey_month, -survey_day) %>%
  group_by(point_id) %>%
  summarise(across(everything(), ~ {
    x <- na.omit(.)
    if (length(x) == 0) NA else paste(unique(x), collapse = ", ")
  }, .names = "{.col}"), .groups = "drop") %>%
  ungroup()
```

Join the `biodiversity_count` dataframe. 

```{r}
enviro_data_df <- enviro_data_df %>%
  left_join(biodiversity_count, by = "point_id")
```

We can summarize the combined NatureCounts and environmental data to explore possible patterns in species abundance or richness.

# 7.3 Species Rank

Species rank plots show the relative abundance (number of individuals) of a species in a community. The number of individuals of each species are sorted in ascending or descending order. 

Group the NatureCounts data by species and rank them in order of abundance.

```{r}
species_rank <- mauricie_data %>%
  filter(!is.na(english_name)) %>%  # Remove rows with NA in english_name
  group_by(english_name) %>%  # Group by species only
  summarize(total_abundance = sum(ObservationCount, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(total_abundance)) %>%  # Sort in descending order of abundance
  slice_max(total_abundance, n = 40) %>%  # Keep only the top 40 species
  mutate(rank = row_number())  # Assign rank to each species
```

Plot the abundance of each species and its rank across the entire park.

```{r}
ggplot(species_rank, aes(x = reorder(english_name, rank), y = total_abundance)) +
  geom_line(group = 1, size = 1, color = "black") +
  geom_point(size = 2, color = "darkgreen") +
  theme_minimal() +
  labs(
    title = "Abundance of High Rank Species across La Mauricie Park",
    x = "Species",
    y = "Total Abundance"
  ) +
  theme(
    axis.text.x = element_text(size = 8, angle = 60, hjust = 1)
  )
```

Represent Species Rank as a horizontal bar plot.

```{r}
ggplot(species_rank, aes(y = reorder(english_name, rank), x = total_abundance)) +
  geom_col(fill = "steelblue") +
  theme_minimal() +
  labs(
    title = "Abundance of High Rank Species across La Mauricie Park",
    y = "Species",
    x = "Total Abundance"
  ) +
  theme(axis.text.y = element_text(size = 7))

```

OPTION 2

Group the NatureCounts data by species and point_id and rank them in order of abundance.

```{r}
species_rank_pointID <- mauricie_data %>%
  filter(!is.na(english_name)) %>%  # Remove rows with NA in english_name
  group_by(point_id, english_name) %>%
  summarize(total_abundance = sum(ObservationCount, na.rm = TRUE), .groups = "drop") %>%
  arrange(point_id, desc(total_abundance)) %>%
  group_by(point_id) %>%
  mutate(rank = row_number())
```

Plot the abundance of each species and its rank for each point_id, respectively.

```{r}
ggplot(species_rank_pointID, aes(x = as.numeric(rank), y = total_abundance, color = factor(point_id))) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  theme_minimal() +
  labs(
    title = "Abundance of High Rank Species across La Mauricie Park",
    x = "Species Rank",
    y = "Abundance",
    color = "Point ID"
  ) +
  scale_color_manual(values = RColorBrewer::brewer.pal(n = length(unique(species_rank_pointID$point_id)), "Set3"))
```
# 7.4 Elevation

NOTE: This result assumes that sampling is random with regards to elevation, which is unlikely. 

```{r}
# Create elevation classes with labels
mauricie_data <- mauricie_data %>%
  mutate(
    elevation_class = case_when(
      elevation < 200 ~ "Low",
      elevation >= 200 & elevation < 400 ~ "Mid",
      elevation >= 400 ~ "High"
    ),
    elevation_class = factor(elevation_class, levels = c("Low", "Mid", "High"))  # Set the factor levels
  )

# Calculate species richness for each elevation class
elevation_summary <- mauricie_data %>%
  group_by(elevation_class) %>%
  summarize(n_species = n_distinct(english_name), .groups = "drop")

# Plot species richness per elevation class
ggplot(elevation_summary, aes(x = elevation_class, y = n_species)) +
  geom_bar(stat = "identity", color = "black", fill = "steelblue") +
  geom_text(
    aes(label = paste0("n = ", n_species)),
    vjust = -0.5, # Position the text above the bars
    size = 3.5    # Adjust text size
  ) +
  theme_minimal() +
  labs(
    title = "Species Richness by Elevation Class",
    x = "Elevation Class",
    y = "Species Richness"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate labels if needed
```

# 7.5 Landscape

Plot the species richness and mean landscape metrics.

```{r}
# Summarize species richness, mean ed, and mean np by point_id 
landscape_summary <- mauricie_data %>%
  group_by(point_id) %>%
  summarize(
    n_species = n_distinct(english_name),  # Species richness
    mean_ed = mean(ed, na.rm = TRUE),      # Mean edge density
    mean_np = mean(np, na.rm = TRUE),      # Mean number of patches
    .groups = "drop"
  )


# Scatterplot: Species richness vs. Mean edge density
ggplot(landscape_summary, aes(x = mean_ed, y = n_species)) +
  geom_point(color = "blue", size = 3, alpha = 0.7) +
  labs(
    title = "Species Richness vs. Edge Density",
    x = "Mean Edge Density",
    y = "Species Richness"
  ) +
  theme_minimal()

# Scatterplot: Species richness vs. Mean number of patches
ggplot(landscape_summary, aes(x = mean_np, y = n_species)) +
  geom_point(color = "green", size = 3, alpha = 0.7) +
  labs(
    title = "Species Richness vs. Number of Patches",
    x = "Mean Number of Patches",
    y = "Species Richness"
  ) +
  theme_minimal()
```

# 7.6  NDVI Plots

Plot the relative species abundance for different NDVI ranges.

```{r}
# Create NDVI classes with numeric labels
mauricie_data <- mauricie_data %>%
  mutate(
    ndvi_range = case_when(
      ndvi > 0 & ndvi <= 0.2 ~ "0 to 0.2",
      ndvi > 0.2 & ndvi <= 0.4 ~ "0.2 to 0.4",
      ndvi > 0.4 & ndvi <= 0.6 ~ "0.4 to 0.6",
      ndvi > 0.6 & ndvi <= 0.8 ~ "0.6 to 0.8",
    )
  )

# Calculate species abundance for each NDVI class
ndvi_summary <- mauricie_data %>%
  group_by(ndvi_range) %>%
  summarize(
    total_individuals = sum(ObservationCount, na.rm = TRUE),
    total_species = n_distinct(scientific_name)
  ) %>%
  mutate(
    relative_abundance = total_individuals / sum(total_individuals)
  )

# Plot relative species abundance per NDVI class
ggplot(ndvi_summary, aes(x = ndvi_range, y = relative_abundance, fill = total_individuals)) +
  geom_bar(stat = "identity", color = "black") +
  geom_text(
    aes(label = paste0("n = ", total_individuals)),
    vjust = -0.5, # Position the text above the bars
    size = 3.5    # Adjust text size
  ) +
  scale_fill_gradient(
    low = "lightgreen", high = "darkgreen", # Customize colors for the gradient
    name = "Abundance"                      # Legend title
  ) +
  theme_minimal() +
  labs(
    title = "Relative Species Abundance According to NDVI Distribution",
    x = "NDVI",
    y = "Relative Abundance"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
```

Plot species richness and minimum NDVI grouped by SiteCode.

```{r}
# Calculate species richness by SiteCode
species_richness <- mauricie_data %>%
  group_by(SiteCode) %>% ###Prehaps we can used distinct lat and long??###
  summarise(species_richness = n_distinct(scientific_name))

# Calculate minimum NDVI by SiteCode
min_ndvi <- mauricie_data %>%
  group_by(SiteCode) %>%
  summarise(min_ndvi = min(ndvi, na.rm = TRUE))

# Merge species richness and mean NDVI by SiteCode
species_ndvi_data <- left_join(species_richness, min_ndvi, by = "SiteCode")

# Plot species richness and minimum NDVI
ggplot(species_ndvi_data, aes(x = min_ndvi, y = species_richness)) +
  geom_point() +
  labs(title = "Species Richness and Minimum NDVI at Each Observation Site",
       x = "Min NDVI",
       y = "Bird Species Richness") +
  theme_minimal()
```

------------------------------------------------------------------------

Congratulations! You completed Chapter 7 - Raster Summary Tools. In this chapter, you successfully transformed and combined raster data, performed raster algebra operations, and created summary plot for NatureCounts data using environmental covariates.