Skip to content

Pipeline for downloading occurrences for a series of scientific names, including taxonomic matching and problem resolving.

License

Notifications You must be signed in to change notification settings

olehprylutskyi/GBIF_occurence_download

Repository files navigation

GBIF_occurence_download

Pipeline for downloading occurrences for a series of scientific names with known conservation status, including taxonomic matching and problem resolving.

Purpose

The main purpose of this pipeline was to obtain georeferenced occurrences unambiguously associated with the conservation status of the taxa they belong to. Existing tools, like rgbif, are doing well with retreiving occurrences for most species, but failed in cases when GBIF species matching tool (say, GBIF Backbone Taxonomy) faces difficulties.

Another issue with existing tools is even you provide a list of scientific names as an input, you received a list of occurrences which often have scientific names differ from the input ones, due to synonymy. It makes complicated to link attributes, assigned to the input names, with the retrieved occurrences.

The pipeline was developed within the project GBIF Viewer: an open web-based biodiversity conservation decision-making tool for policy and governance (The Habitat Foundation and Ukrainian Nature Conservation Group), funded by NLBIF: The Netherlands Biodiversity Information Facility, nlbif2022.014.

Records of species posessing conservation status in Ukraine

How to run the pipeline

This pipeline is 80% automatic, but still requires manual operations to resolve some difficult taxonomy issues. The reason of that is because the GBIF Backbone Taxonomy, which is generally used for name matching, may fail for some names, especially for poorly known or ambiguous taxa. For such exceptions user needs to manually edit higherrank.csv file, generated by 1_data_preparation.R, before running 2_get_gbif_data.R.

GBIF Backbone Taxonomy undergoes periodic (once a couple of months) update. After each one the list of names which faced difficulties with automatic matching is slightly changed. That means, user cannot use the same higherrank.csv for a long time. To facilitate revision of the higherrank.csv, run the script 1a_update_higherrank.R. It automatically retrieve the data from the previous version of the file (named higherrank_nameVariants_prev.csv) and provide a handy GUI for manual revision only those names in which the matching issues appeared for the first time, not leaving your R session.

Technical details

Input:

  • csv file with scientific names, their conservation status, and higher classification.

Output:

  • a simple features spatial data frame, containing georeferenced occurrences unambiguously associated with the conservation status of the taxa they belong to.

Dependencies

  • rgbif
  • dplyr, tidyr, and stringr for data manipulation
  • sf for preparingworking with spatial data
  • DataEditR for GUI for data frame revision
  • ggplot2 for visualisation (optional)

Schematic workflow

Workflow

Scalable diagram

About

Pipeline for downloading occurrences for a series of scientific names, including taxonomic matching and problem resolving.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages