03-all_taxa.Rmd

# All Taxa Starting Points {#All_Taxa}


## Box File Locations


For questions relating to the full zooplankton community we also requested and received data on all taxa collected as part of the CPR survey as well as the phytoplankton color index (PCI). These were sourced from the two different management entities, NEFSC and SAHFOS, which were responsible for data collection and management for different periods of time.

These files were placed within the Climate Change Ecology Lab directory on Box, which is referenced as `ccel_boxpath` in the code:

> **Box/Climate Change Ecology Lab**


## Data Processing Pipeline

The discrete steps in the CPR data processing pipeline for these files are detailed using the {targets} package. The following diagram details the data at each step in the pipeline from the three individual files to the combined zooplankton dataset.

<!--- Adding a border around html output --->

```{=html}
<!--
<style>
    canvas {
        border: groove;
    }
</style>
--->
```
```{r, message=FALSE,}

withr::with_dir(rprojroot::find_root('_targets.R'), 
                tar_glimpse())   
```

The above flowchart details the discrete steps that the data follows in preparation for its use in any analyses that operate on anomalies rather than observed densities of zooplankton.

## NOAA/NEFSC Starting Points

Data obtained from NOAA/NEFSC spans the period of: **1961-2013**, and was delivered in the following excel file:

> **Data/NOAA_1961-2013/Gulf of Maine CPR (Feb 14, 2014 update).xlsx**

That excel file is divided into **2** sheets. One contains data on phytoplankton and the other contains records on zooplankton.

Each excel sheet has two or more additional rows in the header explaining the identification and development stage of certain taxa. In the R script `13_allspecies_cpr_cleanup.R` these headers are stripped, and the different taxa and their stages are appended as the new column names.

Taxonomic codes are also checked against MARMAP codes to ensure accuracy <https://www.nefsc.noaa.gov/nefsc/Narragansett/taxcodesA.html>

Through email correspondence it was communicated that the units were abundance / 100 meters cubed.

```{r}
tribble(
  ~"Sheet Number",  ~"Description",  ~"Units",
  #                  |               |        
  
  "Sheet 1",  
  "Phytoplankton taxa densities by silk transect",
  "Abundance / 1m3",
  
  "Sheet 2", 
  "Zooplankton taxa densities by silk transect",
  "Abundance / 100m3"
) %>% gt()
```

### NOAA Abundances and Taxa Keys

At the end of the reshaping of these NOAA excel sheets, we are left with 4 different tables:

> gom_noaa_zoo\
> gom_noaa_phyto\
> gom_noaa_zoo_key\
> gom_noaa_phyto_key

The "keys" detail the taxonomic codes in the header, and their associated MARMAP codes. The other targets detail the abundances in a wide form, with columns displaying the abundance for each stage at each CPR station.

## SAHFOS Starting Points

Data obtained from SAHFOS spans the period of: **2013-2017**, and was delivered in the following files:

> **Gulf of Maine CPR/SAHFOS-MBA_2013-2017/MC part1.xlsx**\
> **Gulf of Maine CPR/SAHFOS-MBA_2013-2017/MC part 2.xlsx**

Each of these files contain **three** sheets containing the different counting scales of the CPR survey, the phytoplankton, the traverse, and the "eye-count" scales. These three scales are based on the size of organisms counted. At each scale different subsets of the silk transect are used when counting the individually identified taxa. Counts from the subsets are then scaled to the entire silk transect and given a number on a categorical counting scale. These represent discrete jumps in abundance per transect.

These three measurement increments correspond with the following sub-sampling protocols to save time when counting very small organisms:

1.  **Phyto** - 1/8000th of transect counted\
2.  **Traverse** - 1/40th of transect counted\
3.  **Eyecount** - full transect counted

```{r}
tribble(
 ~"Sheet Number",  ~"Description",  ~"Units",
  #                  |               |        
  
  "Sheet 1",  
  "Phytoplankton taxa densities by silk transect",
  "Abundance / Transect",
  
  "Sheet 2", 
  "Zooplankton taxa densities by silk transect",
  "Abundance / Transect",
 
  "Sheet 3", 
  "Eyecount taxa densities by silk transect",
  "Abundance / Transect"
) %>% gt()
```

### SAHFOS Taxa Keys

The import and reshaping of this data is facilitated by taxon keys, which exist in the targets pipeline as the following targets and are created using information in the headers of the excel sheets:

> sahfos_mc1_taxa\
> sahfos_mc2_taxa

The SAHFOS keys are "long datasets" with only four columns. A column that details the measurement level (phyto, traverse, eye), a column detailing what taxon it is, the marmap number associated with it, and any notes from the header.

### SAHFOS Abundances

Because the SAHFOS data came as different sheets for different observation scales, we processed the data accordingly. Each measurement scale was processed in parallel for the MC1 & MC2 datasets before appending them together. This produced three sets of abundance data, one for phyto, traverse, and for eye count abundances. These were tagged as the following targets:

> sahfos_phyto
>
> sahfos_trav
>
> sahfos_eye

### Resolving Unit Differences

For the SAHFOS data, it was determined that the abundances recorded are in the abundance/transect and not abundance/100$m^3$ as recorded in the NOAA data.

These values are converted in this step to the **common unit of the zooplankton density as:** *Abundance per 100 cubic meters of water*.

To convert from the volume of water covered by a silk transect, the following constants are used:

```{r}
tribble(
  ~"Constant", ~"Value",
  "Distance Traveled by CPR Device for Transect", "10 Nautical Miles",
  "CPR Opening Aperture Size", "1.27 square cm",
  "CPR Aperture Size (square meters)", "0.0016129",
  "Meters in 10 Nautical Miles", "18520",
  "Volume Sampled in 10 Nautical Mile Transect", "2.987091 cubic meters",
  "Transect to 100m3 Equation", "(1 / 2.987091) * 100"
) %>%  gt()
```

Once abundances were in a common unit, zooplankton at the traverse and eye count scales were combined for a common zooplankton abundance dataset.

> sahfos_zoo_100m

### Resolving Differences in Development Stage Groups 

The steps needed to unite the NOAA starting points and their counterparts among the SAHFOS starting points involves finding parsimony among the different column names that are a combination of **Taxa** and their **Development Stages**.

Across the two sources different groupings have been used that sometimes overlap and include some or all of another group. Example: Calanus 1-3 & Calanus 1-4 or Calanus 4-6...

Before these tables can be appended together these groupings need to be sorted out. This is done using the following two scripts which were later written as independent function steps for the targets pipeline:

> **15_NOAA_CPR_Cleanup.R**\
> **16_SAHFOS_CPR_Cleanup.R**

The resulting targets that are produced by these steps are:

> sahfos_zoo_renamed
> noaa_taxa_resolved

## Appending NOAA & SAHFOS Data Sources

Once data from these two sources have been converted to common units of measurement and common taxonomic groupings they are able to be appended for one consistent timeseries. The result is a single continuous record of zooplankton abundance and the accompanying phytoplankton color index records. This dataset is then ready to be converted to seasonal anomalies.

> gom_combined_zooplankton