Skip to content
Dmitry Mozzherin edited this page Aug 27, 2017 · 24 revisions

Scientific Names List Resolver

Introduction

gnlist-resolver-gui or Scientific Names List Resolver is an app that allows you to upload a file containing scientific names and match it with scientific names from a data set (for example Catalogue of Life, IPNI, ZooBank etc)

Features

  • It allows to compare large lists of names (up to 100,000 names) all at the same time and returns result either in CSV or Excel-compatible XLSX format

  • It returns important statistics about the match -- was it exact or fuzzy, edit distance for fuzzy matches, confidence score for the match, classification and id from another resource (if they are available)

  • For fuzzy match in XLSX format it highlights the difference between matched names

Highlights for fuzzy matches

Usage

  • Prepare your file by saving in CSV format using UTF-8 encoding. There are several supported formats for a file, and there is a good chance that you will only need to change headers according to the list of supported terms.

  • When names are represented as one string, modify capitalizations of the words to correspond to nomenclatural rules (for example convert ECHINISCOIDES Sigismundi Groenlandicus KRISTENSEN and HALLAS, 1980 to Echiniscoides sigismundi groenlandicus KRISTENSEN and HALLAS, 1980). Note that the authors names might be capitalized.

  • Upload the file

  • Check the headers. The headers recognized by the app will appear with a green background. All other headers will ignored by the matching process. You can delete erroneous matches, or add a new match. Note that there are two possible workflows:

    1. The name is given as a single string (scientificName term is present)
    2. The name is split into parts (genus, specificEpithet terms are present)
  • Pick a source that you want to use for name matching and select other settings, if available

  • Get a break, and watch statistics of your match updated dynamically.

  • When all is done (of after pushing the Cancel button) download results of the match in CSV or Excel format

Supported Terms for the Headers

taxonID

scientificName

scientificNameAuthorship

taxonrank

kingdom

subKingdom

phylum

subPhylum

superClass

class

subClass

cohort

superOrder

order

subOrder

infraOrder

superFamily

family

subFamily

tribe

subTribe

genus

subGenus

section

specificEpithet

subSpecificEpithet

variety

form

Input file format

  • Comma Separated File with names of fields in the first row.
  • Columns can be separated by tab, comma or semicolon
  • At least some columns should have recognizable fields, unused fields won't hurt the process
  • Comma or semicolon-separated values need to be bordered by double quotes if there are commas or semicolons inside the value

taxonID kingdom phylum class order family genus species subspecies variety form scientificNameAuthorship scientificName taxonRank

simplest Example -- only scientificName

scientificName
Animalia
Macrobiotus echinogenitus subsp. areolatus Murray, 1907

taxonID and scientificName Example

taxonID;scientificName
1;Macrobiotus echinogenitus subsp. areolatus Murray, 1907
...
taxonID scientificName
1 Animalia
2 Macrobiotus echinogenitus subsp. areolatus Murray, 1907

Rank Example

taxonID;scientificName;taxonRank
1;Macrobiotus echinogenitus f. areolatus Murray, 1907;form
...
taxonID scientificName taxonRank
1 Animalia kingdom
2 Macrobiotus echinogenitus subsp. areolatus Murray, 1907 subspecies

Family and Authorship Example

taxonID;family;scientificName;scientificNameAuthorship
1;Macrobiotidae;Macrobiotus echinogenitus subsp. areolatus;Murray, 1907
...
taxonID family scientificName scientificNameAuthorship
1 Animalia
2 Macrobiotidae Macrobiotus echinogenitus Murray

Fine-grained Example

TaxonId;kingdom;subkingdom;phylum;subphylum;superclass;class;subclass;cohort;superorder;order;suborder;infraorder;superfamily;family;subfamily;tribe;subtribe;genus;subgenus;section;species;subspecies;variety;form;ScientificNameAuthorship
1;Animalia;;Tardigrada;;;Eutardigrada;;;;Parachela;;;Macrobiotoidea;Macrobiotidae;;;;Macrobiotus;;;harmsworthi;obscurus;;;Dastych, 1985
TaxonId kingdom subkingdom phylum subphylum superclass class subclass cohort superorder order suborder infraorder superfamily family subfamily tribe subtribe genus subgenus section species subspecies variety form ScientificNameAuthorship
136021 Animalia Pogonophora
136022 Animalia Pogonophora Frenulata Webb, 1969
565443 Animalia Tardigrada Eutardigrada Parachela Macrobiotoidea Macrobiotidae Macrobiotus harmsworthi obscurus Dastych, 1985

You can take and modify example files to suite your needs

Output file format

Output includes the following fields:

Field Description
taxonID original ID attached to a name in the checklist
scientificName name from the checklist
matchedScientificName name matched from the GN Reolver data source
inputCanonicalForm canonical form of the input name
matchedCanonicalForm canonical form of the matched name
editDistance for fuzzy-matching -- how many characters differ between checklist and data source name
rank rank from the source (if it was given/inferred)
matchedRank corresponding rank from the data source
matchType what kind of match it is
score heuristic score from 0 to 1 where 1 is a good match, 0.5 match requires further human investigation
matchTaxonID the ID of matched name
classification a hierarchy path for the matched name