- These highlights pages aim to provide a brief overview of selected data items
- contained within the Parvovirus-GLUE project.
-
-
-
-
- EPVs: Dependo.54.Cavia (enRep)
-
-
-
-
-
-
- Dependo.54.Cavia (also called enRep) is one of several
- dependoparvovirus-derived EPVs
- identified in the
- germline of guinea pigs (genus Cavia).
- We identified two groups of elements that spanned the rep gene.
- The first includes enRep sequences from both guinea pig species examined.
- The second includes a longer element that spans an entire rep gene. This second
- group of elements is much older, being shared across rodents species that
- diverged >70 million years ago.
-
-
-
-
-
-
-
-
-
-
-
-
- left to right: Guinea pig, Skeletal muscle fibres, parvovirus virion.
-
-
-
-
-
-
-
-
-
- While most of this EPV sequence is degraded, the portions included in the
- enRep-Myo9 gene are intact in multiple species of guinea pig (genus Cavia),
- consistent with evolution under purifying selection.
-
-
-
-
- The broad expression of enRep-Myo9 mRNA, the conservation of it’s EPV-derived
- regions in multiple species of guinea pig (genus Cavia), indicate that this
- host-virus fusion gene encodes a protein with a physiologically relevant role.
-
-
-
-
- The viral portions of enRep-Myo9 derive from an ancient dependoparvovirus
- (genus Dependoparvovirus) that was incorporated into the genome of caviomorph
- rodents >6 million years ago.
-
-
-
- Occasionally, our investigations of WGS databases turn up sequences that do not derive from
- endogenous parvoviral elements (EPVs), but instead from infectious viruses
- that have contaminated genomic DNA samples.
- In 2017 we reported sequences derived from viruses belonging to genus Chaphamaparvovirus
- in WGS data of diverse vertebrate species.
- Detection and analysis of these sequences indicated that the host range of
- 'chappaparvoviruses' (as the group was then known) encompassed a diverse range of
- vertebrate species.
-
-
-
-
-
- Chaphamaparvoviruses are representatives of a newly described parvovirus subfamily: ''Hamaparvovirinae'.
- Although relatively little is known about these viruses (most have only been described at sequence-level)
- it is becoming clear that they are very widely distributed among vertebrate species,
- and that some are associated with disease.
- For example, porcine parvovirus 7 (PPV) is one of the organisms associated with
- Stillbirths Mummification Embryonic Death and Infertility (SMEDI) syndrome in domestic pigs, while
- mouse kidney parvovirus
- is associated with “inclusion body nephritis/nephropathy” - a disease of immunocompromised laboratory mice.
-
-
-
-
-
-
-
-
-
-
-
- Some of the species in which hamaparvoviruses and/or hamaparvovirus-derived EPVs have been identified.
- Left to right: 'Icthamaparvovirus'-derived EPVs were identified in the tiger-tail seahorse;
- Porcine parvovirus 7 is an emerging virus of domestic pigs;
- Mouse kidney parvovirus is associated with nephrotic disease in immunosuppressed laboratory mice;
- We have identified EPVs derived from unclassified hamaparvovirus-like viruses in
- a wide range of invertebrate species.
-
-
-
-
-
-
-
-
-
- We subsequently performed broader screening in animal genomes and identified
- EPV sequences derived from unclassified hamaparvovirus-like viruses in
- arthropods and molluscs, as well as an ichthamaparvovirus-derived EPV identified in the
- genome of the tigertail seahorse.
- Ichthamaparvovirus is the second genus defined in subfamily Hamaparvovirinae.
- Officially it contains only a single species, Syngnathus scovelli chapparvovirus (ScChPV),
- identified in the gulf pipefish (Syngnathus scovelli). However,
- phylogenetic evidence supports the inclusion of 'Ichthyic parvovirus' in this genus.
-
-
-
-
-
-
-
- We recently identified icthamaparvovirus-derived EPVs in snakes,
- providing robust evidence that the host range of this viral genus extends to reptiles.
- Furthermore, orthologous copies of this EPV were identified in multiple snake species
- establishing that it integrated into the serpentine germline >50 million years ago.
- EPV-Icthama.2-Serpentes thus provides the most robust evidence yet that
-
- hamaparvoviruses are an ancient lineage
- and have been associated with vertebrates throughout their evolution.
-
- Amdo.1.Ellobius is an amdoparvovirus-derived EPV, identified in the genome of
- the Transcaucasian mole vole (Ellobius lutescens)
- - a species of cricetid rodent
- inhabiting semi-arid or grassland areas in Central Asia, and notable for
- its unusual karyotype: only a single sex chromosome is present - with the Y chromosome having been
- eliminated - and all individuals possess a diploid number of 17
- chromosomes.
- This interesting characteristic has motivated the sequencing of the E.lutescens
- genome, as well as that of a sister species - the northern mole vole
- (E. talpinus).
-
-
-
-
-
- We identified the corresponding empty genomic integration sites
- in E. talpinus indicating that both elements were incorporated into the
- E. lutescens germline within the last 10 million years.
- Intriguingly, in silico pedictions indicated that this replicase could be
- expressed as a fusion protein with a partial
- MAFG gene product.
-
-
-
-
-
-
-
-
-
-
-
-
-
- Transcaucasus region - habitat of the Transcaucasian mole vole (Ellobius lutescens).
-
-
-
-
-
-
-
-
-
- We identified four further EPV in mammal and reptile genomes that are intermediate
- between amdoparvoviruses and their sister genus (Protoparvovirus) in terms of
- their phylogenetic placement and genomic features. In particular, we identify
- a genome-length EPV in the genome of a pit viper (Protobothrops mucrosquamatus)
- that is intermediate between proto- and amdoparvoviruses.
- Notably, it exhibits characteristically amdoparvovirus-like
- genome features including: (i) a putative middle ORF gene; (ii) a capsid gene
- that lacks a phospholipase A2 (PLA2) domain; (iii) a genome structure consistent
- with an amdoparvovirus-like mechanism of capsid gene expression.
-
-
-
-
-
- More recently, we identify orthologous copies of the Amdoparvovirus-like EPV
- in additional snake species, establishing it integrated into the genome
- >100 million years ago. Despite this, some copies encode a replicase gene that
- appears to have the potential to express intact protein.
-
-
- Endogenous parvovirus (EPV) data in the Parvovirus-GLUE-EVE extension
-
-
-
-
- Whole genome sequencing has revealed that DNA sequences derived from
- parvoviruses are present within vertebrate genomes. These ‘endogenous parvoviral elements’
- (EPVs) are thought to have originated via ‘germline incorporation’ events in
- which parvovirus DNA sequences were integrated into chromosomal DNA of germline
- cells and subsequently inherited as novel host alleles.
-
-
-
-
-
-
-
-
-
- Some of the species in which we identified novel parvoviruses
- and endogenous viral elements (EVEs) derived from parvoviruses.
- Top row, left to right: Masai giraffe (Giraffa camelopardalis tippelskirchii)),
- Tasmanian devil (Sarcophilus harrisii),
- elephants (Elephantidae),
- chinchilla (Chinchilla lanigera).
- Bottom row, left to right: Northern fur seals (Callorhinus ursinus), pit vipers (Crotalinae),
- Leadbetter's possum (Gymnobelideus leadbeateri) , Transcaucasian mole vole (Ellobius lutescens).
-
-
-
-
-
-
-
- Analysis of EPVs has proven immensely informative with respect to the long-term
- evolutionary history of the Parvoviridae. EPV sequences are in some ways
- equivalent to parvovirus ‘fossils’ in that they provide a source of retrospective
- information about the distant ancestors of modern parvoviruses.
-
-
-
- Currently, the distribution and diversity of parvovirus-related sequences
- in animal genomes remains incompletely characterized.
- Progress in characterising these elements has been hampered by the challenges
- encountered attempting to analyse their fragmentary and degenerated sequences.
- Parvovirus-GLUE aims to address these issues.
- We have incorporated into this project a set of principles for
- organising the parvovirus 'fossil record', and a protocol
- through which it can be accessed and collaboratively developed.
-
-
-
-
- Please note: links to files on GitHub are mainly designed to indicate where
- these files are located within the repository.
- To investigate files (e.g. tree files) in the appropriate software context we recommend
- downloading the entire repository
- and browsing locally.
-
-
-
-
-
-
-
- Where do the EPV data come from?
-
-
-
-
-
-
- EVE sequences were recovered from whole genome sequence (WGS) assemblies
- via database-integrated genome screening (DIGS) using the
- DIGS tool.
-
-
-
-
-
- All data pertaining to this screen are included in this repository.
-
-
-
-
-
-
- The complete list of vertebrate genomes screened can be found
- here.
-
-
-
-
- The complete list of invertebrate genomes screened can be found
- here.
-
-
-
- The set of parvovirus polypeptide sequences used as probes can be found
- here.
-
-
-
- The final set of parvovirus and EPV polypeptide sequences used as
- references can be found
-
- here.
-
-
-
- Input parameters for screening using the
- DIGS tool
- can be found
- here.
-
-
-
-
-
-
-
- Standardised nomenclature for EPVs
-
-
-
-
- We have applied a systematic approach to naming EPV, following a convention
- developed for endogenous retroviruses (ERVs).
- Each element was assigned a unique identifier (ID) constructed from a defined
- set of components.
-
-
-
-
-
-
-
- The first component is the classifier ‘EPV’ (endogenous parvovirus element).
-
-
-
- The second component is a composite of two distinct subcomponents separated by a period:
-
- (i) the name of EPV group;
- (ii) a numeric ID that uniquely identifies the insertion.
- The numeric ID is an integer identifies a unique insertion locus that arose as a
- consequence of an initial germline infection. Thus, orthologous copies in different
- species are given the same number.
-
-
-
- The third component of the ID defines the set of host species in which the ortholog occurs.
-
-
-
-
-
-
- EPV reference sequences and data
-
-
-
-
-
- We reconstructed reference sequences for EPVs using alignments of EPV
- sequences derived from the same initial germline colonisation event - i.e.
- orthologous elements in distinct species, and paralogous
- elements that have arisen via intragenomic duplication of EPV sequences.
-
-
-
-
- Raw data in tabular format are can be found at the following links/directories:
-
- Multiple sequence alignments - maps of homology between EPVs and viruses
-
-
-
-
-
- Multiple sequence alignment constructed in this study are linked together using
- GLUE's
- ‘alignment tree’
- data structure. Alignments in the project include:
-
-
-
A single ‘root’
- alignment constructed to represent proposed
- homologies between representative members of major parvovirus lineages
- (including extinct lineages represented only by EPVs).
-
‘Genus-level’
- alignments constructed to represent proposed homologies between the genomes of
- representative members of specific parvovirus genera and EPV reference sequences.
-
‘Tip’
- alignments in which all taxa are derived from a single EPV lineage.
-
-
-
-
-
-
-
-
-
- Phylogenetic trees - reconstructed evolutionary relationships
-
-
-
-
-
-
- We used GLUE to implement an automated process for deriving midpoint rooted,
- annotated trees from the EPV-containing alignments included in our project,
- to reconstruct the evolutionary relationships between EPVs and related viruses.
-
-
-
-
- Trees were constructed at distinct taxonomic levels:
-
-
- These are the raw data generated by
- database-integrated genome screening (DIGS).
- The tabular files contain information about the genomic location of each EVE.
- EVEs were classified by comparison to a polypeptide sequence reference library
- designed to represent the known diversity of parvoviruses - this includes extinct
- lineages represented only by endogenous viral elements (EVEs).
-
-
-
- These data were obtained via
- DIGS
- performed in vertebrate genome assemblies downloaded from
- NCBI genomes
- (2020-07-15).
-
-
-
-
- Raw data in tabular format are can be found at the following links:
-
- The paleovirus component of Parvovirus-GLUE extends GLUE's
- core schema
- to allow the capture of EVE-specific data.
- These schema extensions are defined in
- this file
- and comprise two additional table: 'locus_data' and 'refcon_data'.
- Both tables are linked to the main 'sequence' table via the 'sequenceID' field.
-
-
-
-
- The 'locus_data' table contains EVE locus information: e.g. species, assembly, scaffold, location coordinates.
-
-
-
- The 'refcon_data' table contains summary information for individual
- EVE insertions. It refers to the reference sequences constructed to represent
- each insertion, which reflect our best efforts to reconstruct progenitor virus
- sequences as they might have looked when they initially integrated into
- the germline of ancestral species.
-
-
-
-
-
- Related Publications
-
-
-
-
-
- Campbell M, Loncar S, Gifford RJ, Kotin R, and RJ Gifford
- (2022)
-
- Comparative analysis reveals the long-term co-evolutionary history of parvoviruses and vertebrates.
-
- PLoS Biology
- [view]
-
-
-
- Hildebrandt E, Penzes J, Gifford RJ, Agbandje-Mckenna M, and R Kotin
- (2020)
-
- Evolution of dependoparvoviruses across geological timescales – implications for design of AAV-based gene therapy vectors.
- Virus Evolution
- [view]
-
-
-
- Pénzes JJ, de Souza WM, Agbandje-Mckenna M, and RJ Gifford
- (2019)
-
- An ancient lineage of highly divergent parvoviruses infects both vertebrate and invertebrate hosts.
-
- Viruses
- [view]
-
-
-
-
- Callaway HM, Subramanian S, Urbina C, Barnard K, Dick R, Hafentein SL, Gifford RJ, and CR Parrish
- (2019)
-
- Examination and reconstruction of three ancient endogenous parvovirus capsid proteins in rodent genomes.
-
- Journal of Virology
- [view]
-
-
-
-
- Kobayashi Y, Shimazu T, Murata K, Itou T, Suzuki Y.
- (2019)
-
- An endogenous adeno-associated virus element in elephants.
-
- Virus Res. Mar;262:10-14
- [view]
-
-
-
-
- Valencia-Herrera I, Cena-Ahumada E, Faunes F, Ibarra-Karmy R, Gifford RJ*, and G Arriagada*
- (2019)
- *co-corresponding authors
-
- Molecular properties and evolutionary origins of a parvovirus-derived myosin fusion gene in guinea pigs.
-
- Journal of Virology[view]
-
-
-
- Pénzes JJ, Marsile-Medun S, Agbandje-McKenna M, and RJ Gifford
- (2018)
-
- Endogenous amdoparvovirus-related elements reveal insights into the biology and evolution of vertebrate parvoviruses.
-
- Virus Evolution
- [view]
-
-
-
- Singer JB, Thomson EC, McLauchlan J, Hughes J, and RJ Gifford
- (2018)
-
- GLUE: A flexible software system for virus sequence data.
-
- BMC Bioinformatics
- [view]
-
-
-
- Zhu H, Dennis T, Hughes J, and RJ Gifford
- (2018)
-
- Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database.
- [preprint]
-
-
-
- Gifford RJ, Blomberg B, Coffin JM, Fan H, Heidmann T, Mayer J, Stoye J, Tristem M, and WE Johnson
- (2018)
-
- Nomenclature for endogenous retrovirus (ERV) loci.
-
- Retrovirology
- [view]
-
-
-
- Gloria Arriagada and RJ Gifford
- (2014)
-
- Parvovirus-derived endogenous viral elements in two South American rodent genomes.
-
- J. Virol.
- [view]
-
-
-
- Katzourakis A. and RJ. Gifford
- (2010)
-
- Endogenous viral elements in animal genomes.
-
- PLoS Genetics
- [view]
-
-
- How to use Parvovirus-GLUE - an example-driven tutorial
-
-
-
-
-
-
- This tutorial focuses on carnivore amdoparvovirus 1 - also known as
-
- Aleutian mink disease virus (AMDV).
-
- It comprises the following steps:
-
-
Downloading selected sequences from GenBank.
-
Extracting isolate data from GenBank files.
-
Constructing an alignment using a codon-aware method.
-
Mapping feature coverage of sequence members within an alignment.
-
Reconstructing annotated phylogenetic trees.
-
-
-
-
-
-
-
- If you're unclear about the ways in which Parvovirus-GLUE might be used,
- you may find it useful to skim through this tutorial.
-
-
-
- Alternatively, if you're committed to using Parvovirus-GLUE, we recommend you investigate the
- GLUE example project
- before attempting this tutorial.
-
-
-
-
-
- Background 1: Carnivores, parvoviruses and the fur industry
-
-
-
-
-
-
-
-
- Left to right:
- stoat/ermine (Mustela ermina);
- Leonardo da Vinci's 'Lady with an Ermine' (1489–1491);
- North American mink (Neovison vison);
- Women posing with mink furs in the 1930s;
-
-
-
-
-
-
- Humans have established relatively close relationships with several carnivore species.
- But whereas some are kept as companion animals, others are hunted or farmed for meat or fur.
- In Europe, several
- mustelid
- species have historically been hunted for fur, including
- European mink (Mustela lutreola),
- ermine
- (Mustela ermina)
- and European polecat (Mustela putorius).
- However, the fur of the
- North American mink
- (Neovison vison) is considered
- superior in quality to all of these.
-
-
-
- Furthermore, while records of efforts to breed ermine (stoats) in captivity exist,
- these ventures have apparently been short-lived. By contrast, North American mink
- are extensively farmed.
- The first mink farms were founded in the 1860s, in Upstate New York, and farming
- of American mink was introduced into Europe in the early 1930s. The breeding of
- various fur colour mutants led to a boom in the mink industry in the two decades
- that followed.
-
-
-
- One unintended consequence of the post-war boom in mink farming was the
- introduction of American mink into Europe (as escaped animals took up residence
- in local habitats). American mink (hereafter referred to simply as ‘mink’)
- are now a relatively widespread invasive species in Europe.
-
-
-
-
- In the 1930s a severe disease emerged in farmed mink. This disease was
- originally identified in the Aleutian mink breed and was consequently named
- Aleutian disease (AD). However, it was soon discovered to afflict American mink
- in general.
- AD is caused by an parvovirus in genus (Amdoparvovirus)
- called Aleutian mink disease virus (AMDV). This virus was originally considered
- a species in its own right, but has recently been reclassified as a sublineage of
- carnivore amdoparvovirus 1.
-
- AD is presently considered the most important infectious disease
- affecting farm-raised mink.
-
-
-
-
-
-
-
-
- Background 2: AMDV-related parvoviruses in wild versus farmed carnivores
-
-
-
-
- Infection with AMDV - or related amdoparvoviruses -
- is apparently widespread in wild mink as well as in farmed
- animals. In addition, related viruses have been identified in several other carnivore
- species, including gray foxes, skunks, raccoon dogs and red pandas. However,
- relatively little is known about the biology of amdoparvovirus infection in the
- natural environment, or the broader distribution of amdoparvovirus infections
- in wild species.
-
-
-
-
-
-
-
-
-
-
- Left to right:
- gray fox (Urocyon cinereoargenteus);
- striped skunk (Mephitis mephitis);
- raccoon dog (Nyctereutes procyonoides);
- red panda (Ailurus fulgens);
-
-
-
-
- Most importantly, it is not clear whether the pathology of AD in captive mink is
- typical of disease that occurs in the wild, or if factors associated with fur
- farming somehow enabled the emergence of the disease.
-
-
-
-
- Increased availability
- of molecular sequence data mean it might now be feasible to gain some insight
- into the natural history and evolution of amdoparvoviruses, and this in turn
- may shed light on the emergence of AD.
-
-
-
-
- In this tutorial we will use the Parvovirus-GLUE project
- and published sequence data to investigate AMDV distribution diversity and evolution.
-
-
-
-
-
- 1. Downloading all available AMDV sequences from GenBank
-
-
-
-
-
-
- To download all AMDV entries in NCBI GenBank, we
- will use a customised version of GLUE's 'ncbiImporter'
-
- module. Our project-specific configuration of the module can be viewed
- here.
- Viewing the file, you can probably see that it is configured to download sequences
- based on a query phrase:
-
-
- "Carnivore amdoparvovirus 1"[Organism] AND 200:5000[SLEN]
-
-
-
-
- This 'eSearchTerm' is a standard NCBI entrez text query that
- specifies all GenBank entries labelled "Carnivore amdoparvovirus 1" in the
- 'Organism' field and between 200-5000 nucleotides (nt) in length.
-
-
-
-
- To use the module, first initiate GLUE on the command line as follows:
-
-
-
-MyComputer:Parvovirus-GLUE rob$ gluetools.sh
-GLUE Version 1.1.103
-Copyright (C) 2015-2020 The University of Glasgow
-This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you
-are welcome to redistribute it under certain conditions. For details see
-GNU Affero General Public License v3: http://www.gnu.org/licenses/
-
-Mode path: /
-GLUE>
-
-
-
- Notice from the first line that I'm in the 'Parvovirus-GLUE' directory when
- I initiate GLUE.
- This means that, by default, it will be my 'working directory' - when I reference
- a file from the GLUE console, I'll need to do so relative to this directory.
-
-
-
-
- Next, navigate to the 'parvoviridae' project as shown here:
-
-
- Since the GenBank database is continually expanding, you may find
- that you obtain more sequences than this when running the same
- command at a later date. Note that next time the module is run, only
- the missing sequences will be downloaded.
-
-
-
- Now export the sequences to a 'source' directory as follows.
-
-
-
- The next step is to incorporate additional, sequence related data.
- To do this it we need to consider the underlying schema of the database
- underlying Parvovirus-GLUE, which is defined in
- this file.
-
-
-
-
-
-
-
-
- Parvovirus-GLUE schema extensions. All GLUE projects have a 'sequence' table.
- In Parvovirus-GLUE we've extended this table to capture taxonomic information as
- well as some other sequence-associated data fields. In addition, we've defined
- a separate 'isolate' table that is linked to the 'sequence' table via the
- 'sequenceID' field (always unique to every sequence). The isolate table contains
- information specific to the isolate that the sequence was obtained from - e.g.
- host species from which the isolate was obtained, isolate name, and spatiotemporal
- data associated with isolation.
-
-
-
-
-
-
-
-
- All AMDV sequences have the same taxonomic information.
- This data can be entered on a sequence-by-sequence basis using GLUE commands, but
- to make the process more efficient, we can instead use
-
- a script.
-
-
-
-
-
-Mode path: /project/parvoviridae
-GLUE> run script glue/tutorial/exampleSetTaxonomicDataAmdv.js
-
-
-
-
- Before we can import sequence-associated data to the isolate table we need to first
- create a link to that table for each sequence we have imported.
- This can be accomplished on a sequence-by-sequence basis using GLUE commands.
- Alternatively, to make the process more efficient, we can use a script.
-
-
-
- The script is run as follows:
-
-
- Mode path: /project/parvoviridae
- GLUE> run script glue/tutorial/exampleLinkIsolateDataAmdv.js
-
-
-
-
-
-
- An
- appropriately configured genbankPopulator module
- can now be used to extract sequence and isolate information from GenBank files.
- We can use this module to extract useful isolate-related information
- that is embedded in the "notes" section of the GenBank file.
-
-
- Note that a
- 'where clause'
- is used to limit the query to AMDV. Where clauses can be used in GLUE to
- control how data are selected, and can reference any data field represented
- in the underlying project database schema. The schema can be extended with new
- fields and tables as required.
-
-
-
-
-
-
-
- 3. Creating an alignment of all AMDV sequences.
-
-
-
-
-
-
- To create an alignment, we need to first create a constrained alignment object as follows:
-
- Now we can construct the alignment.
- The Parvovirus-GLUE project contains a
- parvovirus-specific configuration file
- for GLUE's 'compoundAligner' module.
-
- 4. Mapping feature coverage in AMDV sequences.
-
-
-
-
-
- Now that we have created an alignment of AMDV sequences that is constrained to
- the AMDV reference sequence, we can use this alignment in combination with GLUE's
- featurePresenceRecorder module to examine feature coverage within individual AMDV
- sequences.
-
-
-
-
-
- Run the module as follows to generate feature coverage data for the alignment:
-
- In the final step in this tutorial, we will reconstruct a phylogeny of AMDV
- Rep78 genes using the feature coverage tables we have just generated.
-
-
-
- One way to view these trees together with their annotations is by using Andrew Rambaut's
- FigTree
- program.
-
-
-
-
- A pdf version of the tree with annotations attached can be found
- here.
-
-
-
-
- The tips display country (as extracted from GenBank XML) and are coloured by
- isolation host species (Red = undefined, Blue = American mink).
-
-
-
-
-
-
-
- Concluding remarks.
-
-
-
-
-
-
- The aim of this tutorial was to demonstrate how Parvovirus-GLUE provides a flexible,
- extensible framework out of which tailored resources can be quickly developed.
-
-
-
-
-
-
-
-
-
- We chose the example of AMDV because it presents an interesting use case due to the questions surrounding its origin and spread.
-
- To present things quickly, this tutorial has passed lightly over the detail of the
- operations performed at each stage.
- However, the GLUE console - which features tab completion - should be intuitive to
- any bioinformation used to working on the command line.
-
-
-
-
-
-
-
- The tutorial shows how Parvovirus-GLUE can support the implementation of a relatively
- sophisticated phylogenetic investigation of any chosen parvovirus in a few short steps.
- In this example, the use of GLUE allows us to:
-
-
-
Efficiently extract isolate-related information from GenBank files.
-
Use AMDV genome feature annotations to implement a codon-aware alignment procedure.
-
Generate feature coverage data for alignment members and use this to guide
- the implementation of MSA partitions
- - this allows efficient screening of problem sequences, so that phylogeny
- builds can be reliably automated.
-
-
-
Use the linked data within Parvovirus-GLUE to export annotations for display on bootstrapped phylogenies.
-
Maintain concurrency with GenBank - simply re-run the process (including sequence download) to update the analysis output.
-
-
-
-
-
-
- Moreover, the steps taken to perform this analysis establish an AMDV resource
- that can be further extended and developed.
-
-
-
-
-
-
-
-
-
-
- Related Publications
-
-
-
-
-
- Canuti M, McDonald E, Graham SM, Rodrigues B, Bouchard E, Neville R, Pitcher M, Whitney HG, and HD Marshall
- (2020)
-
- Multi-host dispersal of known and novel carnivore amdoparvoviruses.
-
- Virus Evolution
- [view]
-
-
-
-
- Singer JB, Thomson EC, McLauchlan J, Hughes J, and RJ Gifford
- (2018)
-
- GLUE: A flexible software system for virus sequence data.
-
- BMC Bioinformatics
- [view]
-
-
-
-
- This page provides background information on the virus-associated
- data items included in the project - information about endogenous parvoviral elements (EPVs)
- can be found here.
-
-
-
- Please note: links to files on GitHub are mainly designed to indicate where
- these files are located within the repository.
- To investigate files (e.g. tree files) in the appropriate software context we recommend
- downloading the entire repository
- and browsing locally.
-
-
-
-
-
- Parvovirus genome features
-
-
-
-
-
-
- Parvoviruses have linear, single-stranded DNA genomes ~5 kilobases (kb)
- in length. They are typically very compact and generally exhibit the same basic
- genetic organisation comprising two major gene cassettes, one (Rep/NS) that encodes the
- non-structural proteins, and another (Cap/VP) that encodes the structural
- coat proteins of the virion.
-
-
-
-
-
-
-
-
-
- A schematic representation of the canine parvovirus (CPV) genome. NS=non-structural;
- VP=capsid; PLA2=phospholipase A2; ITR=inverted terminal repeat; Kb=kilobases
-
-
-
-
-
-
-
- Some species and genera encode additional polypeptide gene products adjacent to
- these genes or overlapping them in alternative reading frames.
-
-
-
-
-
- The genome is flanked at the 3' and 5' ends by palindromic inverted terminal repeat (ITR)
- sequences that are the only cis elements required for replication.
-
-
- The sequence data in this project are organised into multiple distinct sources.
- Each source contains data in either GenBank XML or plain FASTA format.
- The type of data is indicated by the name of the source (all GenBank XML sources
- contain 'ncbi' in the name).
-
-
-
-
- The following NCBI-derived sources are included in the project:
-
-
-
-
ncbi-refseqs: Core project master reference sequences (one per parvovirus genus)
-
- We explicitly defined the locations of genome features
- on master reference sequences
- (see here).
-
-
-
-
-
-
-
- Multiple sequence alignments (MSAs)
-
-
-
-
- Multiple sequence alignments (MSAs) are the basic currency of comparative genomic analysis.
- MSAs constructed in this study are linked together using
- GLUE's constrained MSA tree data structure.
-
-
-
-
- A 'constrained MSA' is an alignment in which the coordinate space is defined by
- a selected reference sequence. Where alignment members contain insertions relative
- to the reference sequence, the inserted sequences are recorded and stored
- (i.e. sequence data is never deleted).
-
-
-
-
-
- GLUE projects have the option of using a data structure called an alignment tree
- to link constrained MSAs representing different taxonomic levels,
- and we've used this approach in Parvovirus-GLUE.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- The
- phylogenetic tree
- shown above, taken from a report by
- Pénzes et al. (2020),
- shows the evolutionary relationships between currently recognised genera in
- the family Parvoviridae.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- The schematic figure above shows the 'alignment tree' data structure
- currently implemented in Parvovirus-GLUE.
- For the highest taxonomic levels (i.e. at the root) we aligned only the most
- conserved regions of the genome, whereas for the lower
- taxonomic levels (i.e. within and below genus level) we aligned complete coding
- sequences.
-
- We used an alignment tree data structure to link these alignments,
- via a set of common reference sequences.
- The root alignment contains reference sequences for major clades,
- whereas all children of the
- root inherit at least one reference from their immediate parent.
- Thus, all alignments are linked to one another via our chosen set of
- master reference sequences.
-
-
-
-
-
-
-
-
-
-
- Alignments imported into in the project include:
-
-
-
-
A ‘root’
- alignment constructed to represent homology between the two largest subgroupings in the Parvoviridae.
-
‘subfamily’
- alignments constructed to represent proposed
- homologies between representative members of Parvoviridae subfamilies
-
-
‘cross-genus’
- alignments constructed to represent proposed
- homologies between representative members of 'minor' Parvoviridae lineages
-
-
‘genus-level’
- alignments constructed to represent proposed homologies between the genomes of
- representative members of specific parvovirus genera.
-
-
-
-
-
-
-
-
- Phylogenetic trees
-
-
-
-
-
-
- We used GLUE to implement an automated process for deriving midpoint rooted,
- annotated trees from the alignments included in our project.
-
-
-
-
- Trees were constructed at distinct taxonomic levels:
-
-