Skip to content
paulamool edited this page Oct 29, 2019 · 1 revision

Welcome to the PathoPore wiki!

PathoPore - longer sequences and NextFlow faster analysis

This project looks to

  • streamline the data on-analyses of Nanopore sequencing using high performance computing.
  • provide a scalable workflow package utilising NextFlow

Expected Nanopore input data sizes -:

The workflow will have input data sets up to 10 Tbytes in size.

Workflow description -:

PathoPore will accept Nanopore output formats (Fast5 and FastQ) and demultiplex individual samples. Sample reads are then quality checked and filtered for de novo assembly. The sample reads are then mapped to the quality assessed de novo assembly to call consensus bases, to improve the quality of the assembly, referred to as polishing. The reads are then mapped to the polished assembly to call base variants and detect methylated bases.

Outline of workflow tasks - Software tools -:

  1. Long read sequence base calls - guppy
  2. Demultiplexing (optional) - Qcat
  3. Read and quality score distribution - nanoplot
  4. Read filter (based on score and length) - filtlong
  5. De novo assembly - pomoxis mini_assemble/canu
  6. Assess assembly quality - QUAST
  7. Map filtered reads to the assembly - promoxis mini_align/minimap2/bwa
  8. Generate alignment stats - WUB
  9. Index filtered reads - nanopolish index
  10. Call consensus bases - nanopolish variants-consensus/racon/pilon
  11. Map filtered reads to the polished assembly - promoxis mini_align/minimap2/bwa (maybe optional)
  12. Generate polished assembly - nanopolish vcf2fasta
  13. Call variant bases - nanopolish variants
  14. Detect methylated bases - nanopolish call-methylation
  15. Calculate methylation frequency - nanopolish calculate_methylation_frequency.py