From e53cab87576d8a02d5847f028372e1bee8b08b78 Mon Sep 17 00:00:00 2001 From: Rob Gifford Date: Mon, 16 Sep 2024 14:41:28 +0100 Subject: [PATCH] Web content update --- index.html | 1 + website/html/highlights.html | 413 ------------------ website/html/paleoviruses.html | 514 ---------------------- website/html/tutorial.html | 763 --------------------------------- website/html/viruses.html | 515 ---------------------- 5 files changed, 1 insertion(+), 2205 deletions(-) delete mode 100644 website/html/highlights.html delete mode 100644 website/html/paleoviruses.html delete mode 100644 website/html/tutorial.html delete mode 100644 website/html/viruses.html diff --git a/index.html b/index.html index 98a136d8..85232171 100644 --- a/index.html +++ b/index.html @@ -19,6 +19,7 @@

Parvovirus-GLUE

Resources for comparative genomic analysis of parvoviruses.

+ User Guide Download GitHub diff --git a/website/html/highlights.html b/website/html/highlights.html deleted file mode 100644 index e30c56ff..00000000 --- a/website/html/highlights.html +++ /dev/null @@ -1,413 +0,0 @@ - - - - - - - - Parvovirus-GLUE by giffordlabcvr: Highlights - - - - - - - - - - - - - - -
- - -

- Parvovirus-GLUE: Highlights -

-
- -

- These highlights pages aim to provide a brief overview of selected data items - contained within the Parvovirus-GLUE project. -

- - -

- EPVs: Dependo.54.Cavia (enRep) -

-
- - -

- - Dependo.54.Cavia (also called enRep) is one of several - dependoparvovirus-derived EPVs - identified in the - germline of guinea pigs (genus Cavia). - We identified two groups of elements that spanned the rep gene. - The first includes enRep sequences from both guinea pig species examined. - The second includes a longer element that spans an entire rep gene. This second - group of elements is much older, being shared across rodents species that - diverged >70 million years ago. - -

- - - -
- - -

EPV-Dependo.54-Cavia and Myosin 9 homolog

- - -
- left to right: Guinea pig, Skeletal muscle fibres, parvovirus virion. -
- - - -
- - - -

- While most of this EPV sequence is degraded, the portions included in the - enRep-Myo9 gene are intact in multiple species of guinea pig (genus Cavia), - consistent with evolution under purifying selection. -

- - -

- The broad expression of enRep-Myo9 mRNA, the conservation of it’s EPV-derived - regions in multiple species of guinea pig (genus Cavia), indicate that this - host-virus fusion gene encodes a protein with a physiologically relevant role. -

- - -

- The viral portions of enRep-Myo9 derive from an ancient dependoparvovirus - (genus Dependoparvovirus) that was incorporated into the genome of caviomorph - rodents >6 million years ago. - -

- -

- Related data items: -

    -
  1. EPV-Dependo.54-Cavia sequence
  2. - -
- -

- - - - - - - - - - -

- Viruses: Hamaparvoviruses -

-
- - -

- - Occasionally, our investigations of WGS databases turn up sequences that do not derive from - endogenous parvoviral elements (EPVs), but instead from infectious viruses - that have contaminated genomic DNA samples. - In 2017 we reported sequences derived from viruses belonging to genus Chaphamaparvovirus - in WGS data of diverse vertebrate species. - Detection and analysis of these sequences indicated that the host range of - 'chappaparvoviruses' (as the group was then known) encompassed a diverse range of - vertebrate species. - -

- - -

- Chaphamaparvoviruses are representatives of a newly described parvovirus subfamily: ''Hamaparvovirinae'. - Although relatively little is known about these viruses (most have only been described at sequence-level) - it is becoming clear that they are very widely distributed among vertebrate species, - and that some are associated with disease. - For example, porcine parvovirus 7 (PPV) is one of the organisms associated with - Stillbirths Mummification Embryonic Death and Infertility (SMEDI) syndrome in domestic pigs, while - mouse kidney parvovirus - is associated with “inclusion body nephritis/nephropathy” - a disease of immunocompromised laboratory mice. -

- - - -
- - -

EPV-Dependo.54-Cavia and Myosin 9 homolog

- - -
- Some of the species in which hamaparvoviruses and/or hamaparvovirus-derived EPVs have been identified. - Left to right: 'Icthamaparvovirus'-derived EPVs were identified in the tiger-tail seahorse; - Porcine parvovirus 7 is an emerging virus of domestic pigs; - Mouse kidney parvovirus is associated with nephrotic disease in immunosuppressed laboratory mice; - We have identified EPVs derived from unclassified hamaparvovirus-like viruses in - a wide range of invertebrate species. -
- - - - -
- - -

- We subsequently performed broader screening in animal genomes and identified - EPV sequences derived from unclassified hamaparvovirus-like viruses in - arthropods and molluscs, as well as an ichthamaparvovirus-derived EPV identified in the - genome of the tigertail seahorse. - Ichthamaparvovirus is the second genus defined in subfamily Hamaparvovirinae. - Officially it contains only a single species, Syngnathus scovelli chapparvovirus (ScChPV), - identified in the gulf pipefish (Syngnathus scovelli). However, - phylogenetic evidence supports the inclusion of 'Ichthyic parvovirus' in this genus. - -

- - - - -

- We recently identified icthamaparvovirus-derived EPVs in snakes, - providing robust evidence that the host range of this viral genus extends to reptiles. - Furthermore, orthologous copies of this EPV were identified in multiple snake species - establishing that it integrated into the serpentine germline >50 million years ago. - EPV-Icthama.2-Serpentes thus provides the most robust evidence yet that - - hamaparvoviruses are an ancient lineage - and have been associated with vertebrates throughout their evolution. -

- - -

- Related data items: -

    -
  1. Ichthama EPV reference sequence details
  2. -
  3. Ichthama ortholog sequences
  4. -
  5. Chaphamaparvovirus FASTA from WGS
  6. -
  7. Ichthamaparvovirus FASTA from WGS
  8. -
- -

- - - -
-

- EPVs: Amdo.1.Ellobius -

-
- - -

- Amdo.1.Ellobius is an amdoparvovirus-derived EPV, identified in the genome of - the Transcaucasian mole vole (Ellobius lutescens) - - a species of cricetid rodent - inhabiting semi-arid or grassland areas in Central Asia, and notable for - its unusual karyotype: only a single sex chromosome is present - with the Y chromosome having been - eliminated - and all individuals possess a diploid number of 17 - chromosomes. - This interesting characteristic has motivated the sequencing of the E.lutescens - genome, as well as that of a sister species - the northern mole vole - (E. talpinus). -

- - - -

- We identified the corresponding empty genomic integration sites - in E. talpinus indicating that both elements were incorporated into the - E. lutescens germline within the last 10 million years. - Intriguingly, in silico pedictions indicated that this replicase could be - expressed as a fusion protein with a partial - MAFG gene product. -

- - - -
- - - -

Caucasus

- - - -
- Transcaucasus region - habitat of the Transcaucasian mole vole (Ellobius lutescens). -
- - - -
- - - -

- We identified four further EPV in mammal and reptile genomes that are intermediate - between amdoparvoviruses and their sister genus (Protoparvovirus) in terms of - their phylogenetic placement and genomic features. In particular, we identify - a genome-length EPV in the genome of a pit viper (Protobothrops mucrosquamatus) - that is intermediate between proto- and amdoparvoviruses. - Notably, it exhibits characteristically amdoparvovirus-like - genome features including: (i) a putative middle ORF gene; (ii) a capsid gene - that lacks a phospholipase A2 (PLA2) domain; (iii) a genome structure consistent - with an amdoparvovirus-like mechanism of capsid gene expression. -

- - - -

- More recently, we identify orthologous copies of the Amdoparvovirus-like EPV - in additional snake species, establishing it integrated into the genome - >100 million years ago. Despite this, some copies encode a replicase gene that - appears to have the potential to express intact protein. - -

- -

- Related data items: -

    -
  1. Ellobius sequence
  2. -
  3. Pit viper sequence
  4. -
  5. Alignment and homology modelling of Pit Viper
  6. -
- -

- - - - -
-

- Related Publications -

-
- -

- - - Hildebrandt E, Penzes J, Gifford RJ, Agbandje-Mckenna M, and R Kotin - (2020) -
- Evolution of dependoparvoviruses across geological timescales – implications for design of AAV-based gene therapy vectors. - Virus Evolution - [view] -
-
- - Pénzes JJ, de Souza WM, Agbandje-Mckenna M, and RJ Gifford - (2019) -
- An ancient lineage of highly divergent parvoviruses infects both vertebrate and invertebrate hosts. -
- Viruses - [view] -
-
- - Callaway HM, Subramanian S, Urbina C, Barnard K, Dick R, Hafentein SL, Gifford RJ, and CR Parrish - (2019) -
- Examination and reconstruction of three ancient endogenous parvovirus capsid proteins in rodent genomes. -
- Journal of Virology - [view] -
-
- - Valencia-Herrera I, Cena-Ahumada E, Faunes F, Ibarra-Karmy R, Gifford RJ*, and G Arriagada* - (2019) - *co-corresponding authors -
- Molecular properties and evolutionary origins of a parvovirus-derived myosin fusion gene in guinea pigs. -
- Journal of Virology [view] -
-
- - Roediger B, Lee Q, Tikoo S, Cobbin JCA, Henderson JM, Jormakka M, O'Rourke MB, Padula MP, Pinello N, - Henry M, Wynne M, Santagostino SF, Brayton CF, Rasmussen L, Lisowski L, Tay SS, Harris DC, Bertram JF, - Dowling JP, Bertolino P, Lai JH, Wu W, Bachovchin WW, Wong JJ, Gorrell MD, Shaban B, Holmes EC, Jolly CJ, - Monette S, Weninger W. - (2018) -
- An Atypical Parvovirus Drives Chronic Tubulointerstitial Nephropathy and Kidney Fibrosis. - Cell. [view] -
-
- - Souza WM, Romeiro MF, Fumagalli MJ, Modha S, de Araujo J, Queiroz LH, Durigon EL, Figueiredo LT, Murcia PR, Gifford RJ. - (2017) -
- Chapparvoviruses occur in at least three vertebrate classes and have a broad biogeographic distribution. -
- J Gen Virol. - [view] -
-
- - Pénzes JJ, Marsile-Medun S, Agbandje-McKenna M, and RJ Gifford - (2018) -
- Endogenous amdoparvovirus-related elements reveal insights into the biology and evolution of vertebrate parvoviruses. -
- Virus Evolution - [view] -
-
- -

- - - - - -
- - - - - - - - diff --git a/website/html/paleoviruses.html b/website/html/paleoviruses.html deleted file mode 100644 index dd5c8896..00000000 --- a/website/html/paleoviruses.html +++ /dev/null @@ -1,514 +0,0 @@ - - - - - - - Parvovirus-GLUE by giffordlabcvr: Paleoviruses - - - - - - - - - - - -
- - -

- Endogenous parvovirus (EPV) data in the Parvovirus-GLUE-EVE extension -

-
- -

- Whole genome sequencing has revealed that DNA sequences derived from - parvoviruses are present within vertebrate genomes. These ‘endogenous parvoviral elements’ - (EPVs) are thought to have originated via ‘germline incorporation’ events in - which parvovirus DNA sequences were integrated into chromosomal DNA of germline - cells and subsequently inherited as novel host alleles. -

- -
-

Parvovirus EVEs

-

Parvovirus EVEs

- - -
-

- Some of the species in which we identified novel parvoviruses - and endogenous viral elements (EVEs) derived from parvoviruses. - Top row, left to right: Masai giraffe (Giraffa camelopardalis tippelskirchii)), - Tasmanian devil (Sarcophilus harrisii), - elephants (Elephantidae), - chinchilla (Chinchilla lanigera). - Bottom row, left to right: Northern fur seals (Callorhinus ursinus), pit vipers (Crotalinae), - Leadbetter's possum (Gymnobelideus leadbeateri) , Transcaucasian mole vole (Ellobius lutescens). -

-
- - -
- -

- Analysis of EPVs has proven immensely informative with respect to the long-term - evolutionary history of the Parvoviridae. EPV sequences are in some ways - equivalent to parvovirus ‘fossils’ in that they provide a source of retrospective - information about the distant ancestors of modern parvoviruses. -

- -

- Currently, the distribution and diversity of parvovirus-related sequences - in animal genomes remains incompletely characterized. - Progress in characterising these elements has been hampered by the challenges - encountered attempting to analyse their fragmentary and degenerated sequences. - Parvovirus-GLUE aims to address these issues. - We have incorporated into this project a set of principles for - organising the parvovirus 'fossil record', and a protocol - through which it can be accessed and collaboratively developed. -

- - -

- Please note: links to files on GitHub are mainly designed to indicate where - these files are located within the repository. - To investigate files (e.g. tree files) in the appropriate software context we recommend - downloading the entire repository - and browsing locally. - -

- - -
- -

- Where do the EPV data come from? -

-
- - -

- - EVE sequences were recovered from whole genome sequence (WGS) assemblies - via database-integrated genome screening (DIGS) using the - DIGS tool. -

- - - -

- All data pertaining to this screen are included in this repository. -

- - - - - -
-

- Standardised nomenclature for EPVs -

-
- -

- We have applied a systematic approach to naming EPV, following a convention - developed for endogenous retroviruses (ERVs). - Each element was assigned a unique identifier (ID) constructed from a defined - set of components. -

- - -

EPV Nomenclature

- - -

- The first component is the classifier ‘EPV’ (endogenous parvovirus element). -

- -

- The second component is a composite of two distinct subcomponents separated by a period: - - (i) the name of EPV group; - (ii) a numeric ID that uniquely identifies the insertion. - The numeric ID is an integer identifies a unique insertion locus that arose as a - consequence of an initial germline infection. Thus, orthologous copies in different - species are given the same number. -

- -

- The third component of the ID defines the set of host species in which the ortholog occurs. -

- - - -
-

- EPV reference sequences and data -

-
- - -

- We reconstructed reference sequences for EPVs using alignments of EPV - sequences derived from the same initial germline colonisation event - i.e. - orthologous elements in distinct species, and paralogous - elements that have arisen via intragenomic duplication of EPV sequences. -

- - -

- Raw data in tabular format are can be found at the following links/directories: -

    -
  1. Amdoparvoviruses
  2. -
  3. Erythyroparvoviruses
  4. -
  5. Dependoparvoviruses
  6. -
  7. Protoparvoviruses
  8. -
  9. Ichthamaparvoviruses
  10. -
- -

- -

- Nucleotide level data in FASTA format (individual files), can be found at the following links/directories: -

    -
  1. Amdoparvoviruses
  2. -
  3. Erythyroparvoviruses
  4. -
  5. Dependoparvoviruses
  6. -
  7. Protoparvoviruses
  8. -
  9. Ichthamaparvoviruses
  10. -
-

- - - - -
-

- Multiple sequence alignments - maps of homology between EPVs and viruses -

-
- -

- - Multiple sequence alignment constructed in this study are linked together using - GLUE's - ‘alignment tree’ - data structure. Alignments in the project include: - -

    -
  1. A single ‘root’ - alignment constructed to represent proposed - homologies between representative members of major parvovirus lineages - (including extinct lineages represented only by EPVs).
  2. -
  3. Genus-level’ - alignments constructed to represent proposed homologies between the genomes of - representative members of specific parvovirus genera and EPV reference sequences. -
  4. Tip’ - alignments in which all taxa are derived from a single EPV lineage.
  5. -
- - -

- - -
- -

- Phylogenetic trees - reconstructed evolutionary relationships -

-
- - -

- - We used GLUE to implement an automated process for deriving midpoint rooted, - annotated trees from the EPV-containing alignments included in our project, - to reconstruct the evolutionary relationships between EPVs and related viruses. -

- - -

- Trees were constructed at distinct taxonomic levels: - -

    -
  1. Recursively populated root phylogeny (Rep)
  2. -
  3. Genus-level phylogenies
  4. -
  5. EPV lineage-level phylogenies
  6. -
- -

- - -
-

- Raw EPV sequences and data -

-
- - -

- These are the raw data generated by - database-integrated genome screening (DIGS). - The tabular files contain information about the genomic location of each EVE. - EVEs were classified by comparison to a polypeptide sequence reference library - designed to represent the known diversity of parvoviruses - this includes extinct - lineages represented only by endogenous viral elements (EVEs). -

- -

- These data were obtained via - DIGS - performed in vertebrate genome assemblies downloaded from - NCBI genomes - (2020-07-15). -

- - -

- Raw data in tabular format are can be found at the following links: -

    -
  1. Amdoparvoviruses
  2. -
  3. Erythyroparvoviruses
  4. -
  5. Dependoparvoviruses
  6. -
  7. Protoparvoviruses
  8. -
  9. Ichthamaparvoviruses
  10. -
- -

- -

- Nucleotide level data in FASTA format (individual files), can be found at the following links: -

    -
  1. Amdoparvoviruses
  2. -
  3. Erythyroparvoviruses
  4. -
  5. Dependoparvoviruses
  6. -
  7. Protoparvoviruses
  8. -
  9. Ichthamaparvoviruses
  10. -
-

- - - -
-

- Paleovirus-specific schema extensions -

-
- - -

- The paleovirus component of Parvovirus-GLUE extends GLUE's - core schema - to allow the capture of EVE-specific data. - These schema extensions are defined in - this file - and comprise two additional table: 'locus_data' and 'refcon_data'. - Both tables are linked to the main 'sequence' table via the 'sequenceID' field. -

- - -

- The 'locus_data' table contains EVE locus information: e.g. species, assembly, scaffold, location coordinates. -

- -

- The 'refcon_data' table contains summary information for individual - EVE insertions. It refers to the reference sequences constructed to represent - each insertion, which reflect our best efforts to reconstruct progenitor virus - sequences as they might have looked when they initially integrated into - the germline of ancestral species. -

- - -
-

- Related Publications -

-
- -

- - Campbell M, Loncar S, Gifford RJ, Kotin R, and RJ Gifford - (2022) -
- Comparative analysis reveals the long-term co-evolutionary history of parvoviruses and vertebrates. -
- PLoS Biology - [view] -
-
- - Hildebrandt E, Penzes J, Gifford RJ, Agbandje-Mckenna M, and R Kotin - (2020) -
- Evolution of dependoparvoviruses across geological timescales – implications for design of AAV-based gene therapy vectors. - Virus Evolution - [view] -
-
- - Pénzes JJ, de Souza WM, Agbandje-Mckenna M, and RJ Gifford - (2019) -
- An ancient lineage of highly divergent parvoviruses infects both vertebrate and invertebrate hosts. -
- Viruses - [view] -
-
- - - Callaway HM, Subramanian S, Urbina C, Barnard K, Dick R, Hafentein SL, Gifford RJ, and CR Parrish - (2019) -
- Examination and reconstruction of three ancient endogenous parvovirus capsid proteins in rodent genomes. -
- Journal of Virology - [view] -
-
- - - Kobayashi Y, Shimazu T, Murata K, Itou T, Suzuki Y. - (2019) -
- An endogenous adeno-associated virus element in elephants. -
- Virus Res. Mar;262:10-14 - [view] -
-
- - - Valencia-Herrera I, Cena-Ahumada E, Faunes F, Ibarra-Karmy R, Gifford RJ*, and G Arriagada* - (2019) - *co-corresponding authors -
- Molecular properties and evolutionary origins of a parvovirus-derived myosin fusion gene in guinea pigs. -
- Journal of Virology [view] -
-
- - Pénzes JJ, Marsile-Medun S, Agbandje-McKenna M, and RJ Gifford - (2018) -
- Endogenous amdoparvovirus-related elements reveal insights into the biology and evolution of vertebrate parvoviruses. -
- Virus Evolution - [view] -
-
- - Singer JB, Thomson EC, McLauchlan J, Hughes J, and RJ Gifford - (2018) -
- GLUE: A flexible software system for virus sequence data. -
- BMC Bioinformatics - [view] -
-
- - Zhu H, Dennis T, Hughes J, and RJ Gifford - (2018) -
- Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database. - [preprint] -
-
- - Gifford RJ, Blomberg B, Coffin JM, Fan H, Heidmann T, Mayer J, Stoye J, Tristem M, and WE Johnson - (2018) -
- Nomenclature for endogenous retrovirus (ERV) loci. -
- Retrovirology - [view] -
-
- - Gloria Arriagada and RJ Gifford - (2014) -
- Parvovirus-derived endogenous viral elements in two South American rodent genomes. -
- J. Virol. - [view] -
-
- - Katzourakis A. and RJ. Gifford - (2010) -
- Endogenous viral elements in animal genomes. -
- PLoS Genetics - [view] - -

- - - - - - -
- - - - - - - - diff --git a/website/html/tutorial.html b/website/html/tutorial.html deleted file mode 100644 index c9563827..00000000 --- a/website/html/tutorial.html +++ /dev/null @@ -1,763 +0,0 @@ - - - - - - - Parvovirus-GLUE by giffordlabcvr: An example-based tutorial - parvoviruses and the mink fur industry - - - - - - - - - - - -
- - -

- How to use Parvovirus-GLUE - an example-driven tutorial -

-
- - -

- - This tutorial focuses on carnivore amdoparvovirus 1 - also known as - - Aleutian mink disease virus (AMDV). - - It comprises the following steps: -

    -
  1. Downloading selected sequences from GenBank.
  2. -
  3. Extracting isolate data from GenBank files.
  4. -
  5. Constructing an alignment using a codon-aware method.
  6. -
  7. Mapping feature coverage of sequence members within an alignment.
  8. -
  9. Reconstructing annotated phylogenetic trees.
  10. -
- - -

- - -

- If you're unclear about the ways in which Parvovirus-GLUE might be used, - you may find it useful to skim through this tutorial. -

- -

- Alternatively, if you're committed to using Parvovirus-GLUE, we recommend you investigate the - GLUE example project - before attempting this tutorial. -

- - -
-

- Background 1: Carnivores, parvoviruses and the fur industry -

-
- -

Parvoviruses and the fur industry

- - -
-

- Left to right: - stoat/ermine (Mustela ermina); - Leonardo da Vinci's 'Lady with an Ermine' (1489–1491); - North American mink (Neovison vison); - Women posing with mink furs in the 1930s; -

-
- - - -

- Humans have established relatively close relationships with several carnivore species. - But whereas some are kept as companion animals, others are hunted or farmed for meat or fur. - In Europe, several - mustelid - species have historically been hunted for fur, including - European mink (Mustela lutreola), - ermine - (Mustela ermina) - and European polecat (Mustela putorius). - However, the fur of the - North American mink - (Neovison vison) is considered - superior in quality to all of these. -

- -

- Furthermore, while records of efforts to breed ermine (stoats) in captivity exist, - these ventures have apparently been short-lived. By contrast, North American mink - are extensively farmed. - The first mink farms were founded in the 1860s, in Upstate New York, and farming - of American mink was introduced into Europe in the early 1930s. The breeding of - various fur colour mutants led to a boom in the mink industry in the two decades - that followed. -

- -

- One unintended consequence of the post-war boom in mink farming was the - introduction of American mink into Europe (as escaped animals took up residence - in local habitats). American mink (hereafter referred to simply as ‘mink’) - are now a relatively widespread invasive species in Europe. -

- -

- - In the 1930s a severe disease emerged in farmed mink. This disease was - originally identified in the Aleutian mink breed and was consequently named - Aleutian disease (AD). However, it was soon discovered to afflict American mink - in general. - AD is caused by an parvovirus in genus (Amdoparvovirus) - called Aleutian mink disease virus (AMDV). This virus was originally considered - a species in its own right, but has recently been reclassified as a sublineage of - carnivore amdoparvovirus 1. - - AD is presently considered the most important infectious disease - affecting farm-raised mink. -

- -

-

- - -
-

- Background 2: AMDV-related parvoviruses in wild versus farmed carnivores -

-
- -

- Infection with AMDV - or related amdoparvoviruses - - is apparently widespread in wild mink as well as in farmed - animals. In addition, related viruses have been identified in several other carnivore - species, including gray foxes, skunks, raccoon dogs and red pandas. However, - relatively little is known about the biology of amdoparvovirus infection in the - natural environment, or the broader distribution of amdoparvovirus infections - in wild species. -

- -

- - -

Carnivores, parvoviruses and the fur trade

- - -
-

- Left to right: - gray fox (Urocyon cinereoargenteus); - striped skunk (Mephitis mephitis); - raccoon dog (Nyctereutes procyonoides); - red panda (Ailurus fulgens); -

-
- - - Most importantly, it is not clear whether the pathology of AD in captive mink is - typical of disease that occurs in the wild, or if factors associated with fur - farming somehow enabled the emergence of the disease. -

- - -

- Increased availability - of molecular sequence data mean it might now be feasible to gain some insight - into the natural history and evolution of amdoparvoviruses, and this in turn - may shed light on the emergence of AD. -

- - -

- In this tutorial we will use the Parvovirus-GLUE project - and published sequence data to investigate AMDV distribution diversity and evolution. -

- - -
-

- 1. Downloading all available AMDV sequences from GenBank -

-
- - - -

- To download all AMDV entries in NCBI GenBank, we - will use a customised version of GLUE's 'ncbiImporter' - - module. Our project-specific configuration of the module can be viewed - here. - Viewing the file, you can probably see that it is configured to download sequences - based on a query phrase: - -

- "Carnivore amdoparvovirus 1"[Organism] AND 200:5000[SLEN] -

- - -

- This 'eSearchTerm' is a standard NCBI entrez text query that - specifies all GenBank entries labelled "Carnivore amdoparvovirus 1" in the - 'Organism' field and between 200-5000 nucleotides (nt) in length. - -

- -

- To use the module, first initiate GLUE on the command line as follows: - - -

-MyComputer:Parvovirus-GLUE rob$ gluetools.sh 
-GLUE Version 1.1.103
-Copyright (C) 2015-2020 The University of Glasgow
-This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you
-are welcome to redistribute it under certain conditions. For details see
-GNU Affero General Public License v3: http://www.gnu.org/licenses/
-
-Mode path: /
-GLUE>
- - - - Notice from the first line that I'm in the 'Parvovirus-GLUE' directory when - I initiate GLUE. - This means that, by default, it will be my 'working directory' - when I reference - a file from the GLUE console, I'll need to do so relative to this directory. -

- -

- - Next, navigate to the 'parvoviridae' project as shown here: -

- -
-GLUE> project parvoviridae 
-OK
-Mode path: /project/parvoviridae/
- -

- -

- Now create the module using it's - configuration file, - which is contained in the Parvovirus-GLUE project, as shown here: - -

-Mode path: /project/parvoviridae/
-GLUE> create module -f modules/build/genus/amdo/amdoNcbiImporterExample.xml
-OK
-(1 Module created) 
- -

- -

- To run the module, execute the following command in the GLUE shell: - -

-Mode path: /project/parvoviridae/
-GLUE> module amdoNcbiImporterExample import
- -

-

- - When I ran this command on the 2nd March 2022, I obtained the following output: - - -

-  ncbiImporterSummaryResult
-  totalMatching: 1777
-  present: 0
-  surplus: 0
-  missing: 1777
-  deleted: 0
-  downloaded: 1777
- - Since the GenBank database is continually expanding, you may find - that you obtain more sequences than this when running the same - command at a later date. Note that next time the module is run, only - the missing sequences will be downloaded. -

- -

- Now export the sequences to a 'source' directory as follows. - - -

-Mode path: /project/parvoviridae
-GLUE> export source ncbi-curated-parvo-amdv -p sources/genus/amdo/
-

- - -
-

- 2. Importing AMDV sequences and metadata from GenBank files. -

-
- - -

- Now that we've downloaded a set of AMDV sequences, we can incorporate them into - the Parvovirus-GLUE project. - Load the sequences as follows: - -

-Mode path: /project/parvoviridae
-GLUE> import source sources/genus/amdo/ncbi-curated-amdv/
- -

- - -

- The next step is to incorporate additional, sequence related data. - To do this it we need to consider the underlying schema of the database - underlying Parvovirus-GLUE, which is defined in - this file. -

- - -

DB schema

- - -
-

- Parvovirus-GLUE schema extensions. All GLUE projects have a 'sequence' table. - In Parvovirus-GLUE we've extended this table to capture taxonomic information as - well as some other sequence-associated data fields. In addition, we've defined - a separate 'isolate' table that is linked to the 'sequence' table via the - 'sequenceID' field (always unique to every sequence). The isolate table contains - information specific to the isolate that the sequence was obtained from - e.g. - host species from which the isolate was obtained, isolate name, and spatiotemporal - data associated with isolation. - -

-
- -
- - -

- All AMDV sequences have the same taxonomic information. - This data can be entered on a sequence-by-sequence basis using GLUE commands, but - to make the process more efficient, we can instead use - - a script. -

- - - -
-Mode path: /project/parvoviridae
-GLUE> run script glue/tutorial/exampleSetTaxonomicDataAmdv.js
- - - -

- Before we can import sequence-associated data to the isolate table we need to first - create a link to that table for each sequence we have imported. - This can be accomplished on a sequence-by-sequence basis using GLUE commands. - Alternatively, to make the process more efficient, we can use a script. -

- -

- The script is run as follows: - -

- Mode path: /project/parvoviridae
- GLUE> run script glue/tutorial/exampleLinkIsolateDataAmdv.js
- - - - -

- - An - appropriately configured genbankPopulator module - can now be used to extract sequence and isolate information from GenBank files. - We can use this module to extract useful isolate-related information - that is embedded in the "notes" section of the GenBank file. -

- -
- Mode path: /project/cress
- GLUE> module parvoGenbankXmlPopulator populate
- - -

- We can now inspect the data in the isolate table via the command line, to - see if it has been extracted as expected: -

- - -
-Mode path: /project/parvoviridae
-GLUE> list sequence sequenceID isolate.isolate isolate.country -w "name = 'AMDV'"
-
-+============+========+============================+===================================+
-| sequenceID | length |   isolate.host_sci_name    |         isolate.country           |
-+============+========+============================+===================================+
-| AF205380   | 690    | Mustela lutreola           | Spain                             |
-| AF205381   | 687    | Neovison vison             | Spain                             |
-| AF205382   | 690    | Lutra lutra                | Spain                             |
-| EU652446   | 782    | Neovison vison             | China                             |
-| EU652447   | 782    | Neovison vison             | China                             |
-| EU652448   | 785    | Neovison vison             | China                             |
-| EU652449   | 785    | Neovison vison             | China                             |
- -

- - Note that a - 'where clause' - is used to limit the query to AMDV. Where clauses can be used in GLUE to - control how data are selected, and can reference any data field represented - in the underlying project database schema. The schema can be extended with new - fields and tables as required. - -

- - - -
-

- 3. Creating an alignment of all AMDV sequences. -

-
- - - -

- To create an alignment, we need to first create a constrained alignment object as follows: -

- - -
-Mode path: /project/parvoviridae
-GLUE> create alignment AL_AMDV -r REF_MASTER_Amdo_AMDV
-OK
- - -

- We can now specify which sequences belong to this constrained aignment object, as follows: -

- - -
-Mode path: /project/parvoviridae
-GLUE> alignment AL_AMDV add member --whereClause "source.name = 'ncbi-curated-amdv'"
-OK
- -

- Now we can construct the alignment. - The Parvovirus-GLUE project contains a - parvovirus-specific configuration file - for GLUE's 'compoundAligner' module. -

- -
-GLUE> compute alignment AL_AMDV parvoCompoundAligner
- - -

- Now let's inspect the resulting alignment: -

- -
-GLUE> alignment AL_AMDV show statistics
-OK
-GLUE> project parvoviridae alignment AL_AMDV show statistics
-+======================+=====================+============================+=========================+==========+==========+=============+=============+
-| sequence.source.name | sequence.sequenceID | referenceNtCoveragePercent | memberNtCoveragePercent | minRefNt | maxRefNt | minMemberNt | maxMemberNt |
-+======================+=====================+============================+=========================+==========+==========+=============+=============+
-| ncbi-curated-amdv    | AB044558            | 7.6025827952509895         | 100.0                   | 3043     | 3407     | 1           | 365         |
-| ncbi-curated-amdv    | AB044559            | 7.6025827952509895         | 100.0                   | 3043     | 3407     | 1           | 365         |
-| ncbi-curated-amdv    | AF107626            | 6.998541970422829          | 100.0                   | 587      | 922      | 1           | 336         |
-| ncbi-curated-amdv    | AF107627            | 6.998541970422829          | 100.0                   | 587      | 922      | 1           | 336         |
-| ncbi-curated-amdv    | AF107628            | 6.998541970422829          | 100.0                   | 587      | 922      | 1           | 336         |
-| ncbi-curated-amdv    | AF107629            | 6.998541970422829          | 100.0                   | 587      | 922      | 1           | 336         |
-| ncbi-curated-amdv    | AF107630            | 6.998541970422829          | 100.0                   | 587      | 922      | 1           | 336         |
-| ncbi-curated-amdv    | AF107631            | 6.998541970422829          | 100.0                   | 587      | 922      | 1           | 336         |
-| ncbi-curated-amdv    | AF107632            | 6.998541970422829          | 100.0                   | 587      | 922      | 1           | 336         |
- - -

- (only the first few results are shown here) -

- - - -

- Or alternatively, list alignment members, along with their associated data, like this: - - -

-GLUE> alignment AL_AMDV 
-OK
-Mode path: /project/parvoviridae/alignment/AL_AMDV
-GLUE> list member sequence.sequenceID sequence.name sequence.isolate.isolate
-+=====================+===============+====================================+
-| sequence.sequenceID | sequence.name |      sequence.isolate.isolate      |
-+=====================+===============+====================================+
-| AB182568            | AMDV          | YCC-IN1P                           |
-| AB182569            | AMDV          | YCC-IN2P                           |
-| AB182570            | AMDV          | RRP-JP11P                          |
-| AB182571            | AMDV          | RRP-JP12P                          |
-| AB182572            | AMDV          | BTP-SA11P                          |
- - -

- -

- (only the first few results are shown here) -

- - - -
-

- 4. Mapping feature coverage in AMDV sequences. -

-
- - -

- Now that we have created an alignment of AMDV sequences that is constrained to - the AMDV reference sequence, we can use this alignment in combination with GLUE's - featurePresenceRecorder module to examine feature coverage within individual AMDV - sequences. - -

- -

- - Run the module as follows to generate feature coverage data for the alignment: -

- -
-Mode path: /project/parvoviridae
-GLUE> module parvoFeaturePresenceRecorder record feature-presence AL_AMDV -f whole_genome -d
-
-+=======================+=============================+============================+===================================+=========================+=====================+
-| member.alignment.name | member.sequence.source.name | member.sequence.sequenceID | featureLoc.referenceSequence.name | featureLoc.feature.name | referenceNtCoverage |
-+=======================+=============================+============================+===================================+=========================+=====================+
-| AL_AMDV               | ncbi-curated-amdv           | AB044558                   | REF_MASTER_Amdo_AMDV              | whole_genome            | 7.6025827952509895  |
-| AL_AMDV               | ncbi-curated-amdv           | AB044558                   | REF_MASTER_Amdo_AMDV              | VP1                     | 18.775720164609055  |
-| AL_AMDV               | ncbi-curated-amdv           | AB044559                   | REF_MASTER_Amdo_AMDV              | whole_genome            | 7.6025827952509895  |
-| AL_AMDV               | ncbi-curated-amdv           | AB044559                   | REF_MASTER_Amdo_AMDV              | VP1                     | 18.775720164609055  |
-| AL_AMDV               | ncbi-curated-amdv           | AF107626                   | REF_MASTER_Amdo_AMDV              | whole_genome            | 6.998541970422829   |
-| AL_AMDV               | ncbi-curated-amdv           | AF107626                   | REF_MASTER_Amdo_AMDV              | Rep78                   | 18.95093062605753   |
-| AL_AMDV               | ncbi-curated-amdv           | AF107627                   | REF_MASTER_Amdo_AMDV              | whole_genome            | 6.998541970422829   |
- - - -

- - To inspect the alignment that is selected using the where clause above, GLUE's - fastaAlignmentExporter module can be used. - -

- - -
-Mode path: /project/parvoviridae
-GLUE> module fastaAlignmentExporter export AL_AMDV -r REF_MASTER_Amdo_AMDV -f Rep78 -w "fLocNotes.featureLoc.feature.name = 'Rep78' and fLocNotes.ref_nt_coverage_pct >= 80" -p
- - -

- The exported alignment can be found - here. - -

- - -
-

- 5. Reconstructing phylogenetic trees. -

-
- -

- In the final step in this tutorial, we will reconstruct a phylogeny of AMDV - Rep78 genes using the feature coverage tables we have just generated. - - -

-Mode path: /project/parvoviridae
-GLUE> module raxmlPhylogenyGenerator generate nucleotide phylogeny AL_AMDV -r REF_MASTER_Amdo_AMDV -f Rep78 -w "fLocNotes.featureLoc.feature.name = 'Rep78' and fLocNotes.ref_nt_coverage_pct >= 50" -o trees/example/amdv-all-genbank-rep.tre NEWICK_BOOTSTRAPS
-
- - Before you attempt this, bear in mind that reconstructing this phylogeny will - take some time (e.g. around an hour). - -

- - -

- Export annotations for the trees as follows: - - -

- - - -
-Mode path: /project/parvoviridae
-GLUE> module parvoFigTreeAnnotationExporter export figtree-annotation AL_AMDV -w "fLocNotes.featureLoc.feature.name = 'Rep78' and fLocNotes.ref_nt_coverage_pct >= 80" -f tutorial/AMDV-REP78-80pct-annotations.tsv
- - -

- One way to view these trees together with their annotations is by using Andrew Rambaut's - FigTree - program. -

- -

- - A pdf version of the tree with annotations attached can be found - here. -

- - -

- The tips display country (as extracted from GenBank XML) and are coloured by - isolation host species (Red = undefined, Blue = American mink). - - -

- - -
-

- Concluding remarks. -

-
- - -

- - The aim of this tutorial was to demonstrate how Parvovirus-GLUE provides a flexible, - extensible framework out of which tailored resources can be quickly developed. - - -

- - - -

- - - We chose the example of AMDV because it presents an interesting use case due to the questions surrounding its origin and spread. - - To present things quickly, this tutorial has passed lightly over the detail of the - operations performed at each stage. - However, the GLUE console - which features tab completion - should be intuitive to - any bioinformation used to working on the command line. - -

- - -

- - - The tutorial shows how Parvovirus-GLUE can support the implementation of a relatively - sophisticated phylogenetic investigation of any chosen parvovirus in a few short steps. - In this example, the use of GLUE allows us to: - -

- - -

- - Moreover, the steps taken to perform this analysis establish an AMDV resource - that can be further extended and developed. - -

- - -

- - - -
-

- Related Publications -

-
- -

- - Canuti M, McDonald E, Graham SM, Rodrigues B, Bouchard E, Neville R, Pitcher M, Whitney HG, and HD Marshall - (2020) -
- Multi-host dispersal of known and novel carnivore amdoparvoviruses. -
- Virus Evolution - [view] -
-
- - - Singer JB, Thomson EC, McLauchlan J, Hughes J, and RJ Gifford - (2018) -
- GLUE: A flexible software system for virus sequence data. -
- BMC Bioinformatics - [view] -
-
- -

- - - - - -
- - - - - - - - diff --git a/website/html/viruses.html b/website/html/viruses.html deleted file mode 100644 index 15fd32d9..00000000 --- a/website/html/viruses.html +++ /dev/null @@ -1,515 +0,0 @@ - - - - - - - Parvovirus-GLUE by giffordlabcvr: Viruses - - - - - - - - - - - -
- - - - -

- Virus data included in Parvovirus-GLUE -

-
- - -

- This page provides background information on the virus-associated - data items included in the project - information about endogenous parvoviral elements (EPVs) - can be found here. -

- -

- Please note: links to files on GitHub are mainly designed to indicate where - these files are located within the repository. - To investigate files (e.g. tree files) in the appropriate software context we recommend - downloading the entire repository - and browsing locally. - -

- - -

- Parvovirus genome features -

-
- - -

- - Parvoviruses have linear, single-stranded DNA genomes ~5 kilobases (kb) - in length. They are typically very compact and generally exhibit the same basic - genetic organisation comprising two major gene cassettes, one (Rep/NS) that encodes the - non-structural proteins, and another (Cap/VP) that encodes the structural - coat proteins of the virion. -

- -
- - -

CPV genome

- -
-

- A schematic representation of the canine parvovirus (CPV) genome. NS=non-structural; - VP=capsid; PLA2=phospholipase A2; ITR=inverted terminal repeat; Kb=kilobases -

- -
- -
- -

- Some species and genera encode additional polypeptide gene products adjacent to - these genes or overlapping them in alternative reading frames. -

- - - -

- The genome is flanked at the 3' and 5' ends by palindromic inverted terminal repeat (ITR) - sequences that are the only cis elements required for replication. - -

- -

- - Parvovirus-GLUE defines a - standard set of genome features - for parvoviruses and - records the locations - of these genome features on master reference sequences (see below). - -

- - -
-

- Sequences and sequence-associated data -

-
- - -

- The sequence data in this project are organised into multiple distinct sources. - Each source contains data in either GenBank XML or plain FASTA format. - The type of data is indicated by the name of the source (all GenBank XML sources - contain 'ncbi' in the name). -

- - -

- The following NCBI-derived sources are included in the project: - - -

- - -

- - - - -

- GenBank XML files are imported into this project directly from NCBI GenBank - using an appropriately configured version - of GLUE's 'GenBankPopulator' module and are uniquely identified within this project by their - GenBank accession numbers. - -

- -

- Sequences included in this project are linked to auxiliary data in tabular format, as follows: -

- - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ParameterTypeDefinition
full_nameVARCHARFull name of the virus this sequence is derived from
nameVARCHARAbbreviated name of the virus this sequence is derived from
subfamilyVARCHARTaxonomy - virus subfamily
supergenusVARCHARTaxonomy - virus supergenus (proposed)
genusVARCHARTaxonomy - virus genus
cladeVARCHARTaxonomy - virus clade
isolate_nameVARCHARName of the virus isolate this sequence is derived from
isolation_hostVARCHARSpecies (Latin binomial) virus was isolated from
lengthINTEGERLength of the sequence
pubmed_idINTEGERPubMed ID of manuscript associated with sequence
gb_create_dateGenBankGenBank creation date of the sequence
gb_update_dateVARCHARDate of most recent GenBank update
countryVARCHARCountry where virus was isolated
place_sampledVARCHARLocation of sampling (state, region, or city)
collection_yearINTEGERYear virus was isolated
collection_monthVARCHARMonth virus was isolated
collection_month_dayVARCHARDay of month virus was isolated
- - - -
-

- Parvovirus reference sequences -

-
- -

- - For all offially recognised parvoviral genera, we defined a 'master' reference sequence, as follows: - - -

- Parvovirinae -

- - - - -

- Hamaparvovirinae -

- - - - - -

- Densoparvovirinae -

- - - -

- - - -

- - We explicitly defined the locations of genome features - on master reference sequences - (see here). - -

- - - -
-

- Multiple sequence alignments (MSAs) -

-
- -

- Multiple sequence alignments (MSAs) are the basic currency of comparative genomic analysis. - MSAs constructed in this study are linked together using - GLUE's constrained MSA tree data structure. -

- - -

- A 'constrained MSA' is an alignment in which the coordinate space is defined by - a selected reference sequence. Where alignment members contain insertions relative - to the reference sequence, the inserted sequences are recorded and stored - (i.e. sequence data is never deleted). -

- - -

- - GLUE projects have the option of using a data structure called an alignment tree - to link constrained MSAs representing different taxonomic levels, - and we've used this approach in Parvovirus-GLUE. -

- - - -
- - - - -

Parvovirus phylogeny

- - -
- -

- The - phylogenetic tree - shown above, taken from a report by - Pénzes et al. (2020), - shows the evolutionary relationships between currently recognised genera in - the family Parvoviridae. -

- -
- - - - - - -
- -

Alignment tree concept

- - - -
-

- The schematic figure above shows the 'alignment tree' data structure - currently implemented in Parvovirus-GLUE. - For the highest taxonomic levels (i.e. at the root) we aligned only the most - conserved regions of the genome, whereas for the lower - taxonomic levels (i.e. within and below genus level) we aligned complete coding - sequences. - - We used an alignment tree data structure to link these alignments, - via a set of common reference sequences. - The root alignment contains reference sequences for major clades, - whereas all children of the - root inherit at least one reference from their immediate parent. - Thus, all alignments are linked to one another via our chosen set of - master reference sequences. -

- -
- - -
- -

- - - Alignments imported into in the project include: - - -

    -
  1. A ‘root’ - alignment constructed to represent homology between the two largest subgroupings in the Parvoviridae. -
  2. subfamily’ - alignments constructed to represent proposed - homologies between representative members of Parvoviridae subfamilies
  3. - -
  4. cross-genus’ - alignments constructed to represent proposed - homologies between representative members of 'minor' Parvoviridae lineages
  5. - -
  6. genus-level’ - alignments constructed to represent proposed homologies between the genomes of - representative members of specific parvovirus genera.
  7. -
- -

- - - -
-

- Phylogenetic trees -

-
- - -

- - We used GLUE to implement an automated process for deriving midpoint rooted, - annotated trees from the alignments included in our project. -

- - -

- Trees were constructed at distinct taxonomic levels: - -

    -
  1. Family-level (root) phylogeny (Rep)
  2. -
  3. Genus-level phylogenies
  4. -
- -

- - - - - - - - -
- - - - - - - -