This pipeline is primarily developed to extract methylation data from ONT reads. Recently we added support to process PacBio data too.
From modBAM to methylBed
This pipeline takes modification basecalled ONT reads or PacBio HiFi reads predicted with 5mC (modBam) as input, align to assembly provided and then extract methylation calls into bed/ bedgraph format.
Require inputs:
- ONT/ PacBio modBAM
- reference genome for alignment
-
trim and repair tags of input modBam (optional)
- trim and repair workflow:
- sort modBam -
samtools sort
- convert modBam to fastq -
samtools fastq
- trim barcode and adapters -
porechop
- convert trimmed modfastq to modBam -
samtools import
- repair MM/ML tags of trimmed modBam -
modkit repair
- sort modBam -
- trim and repair workflow:
-
align to reference (plus sorting and indexing) -
dorado aligner
- include alignment summary -
samtools flagstat
- include alignment summary -
-
create bedMethyl -
modkit pileup
-
create bedgraphs (optional)
-
align to reference -
minimap2
orpbmm2
(default)-
minimap workflow:
- convert modBam to fastq -
samtools convert
- alignment -
minimap2
- sort and index -
samtools sort
- alignment summary -
samtools flagstat
- convert modBam to fastq -
-
pbmm2 workflow:
- alignment and sorting -
pbmm2
- index -
samtools index
- alignment summary -
samtools flagstat
- alignment and sorting -
-
-
create bedMethyl -
modkit pileup
(default) orpb-CpG-tools
- pileup with
modkit pileup
is default setting - 2 options for
pb-CpG-tools
:- default using
count
(no need to give any parameters) - or can set to using
model
(parameter settings check next section) - prior to pileup, aligned reads are split based on F/R strands
- default using
- pileup with
-
create bedgraph (optional)
To run the pipeline with a samplesheet on biohpc_gen with charliecloud:
nextflow run nf-methTRAP --samplesheet 'path/to/sample_sheet.csv' \
--out './results' \
-profile biohpc_gen,charliecloud
Note: Porechop, Modkit and Dorado containers are hosted at the LRZ gitlab registry. This requires authentication, currently not handled by nextflow. These containers need to be pre-pulled. Example:
ch-image pull --auth gitlab.lrz.de:5005/beckerlab/container-playground/porechop_pigz:4ba2bef9
Parameter | Effect |
---|---|
--samplesheet |
Path to samplesheet |
--out |
Results directory, default: './results' |
--no_trim |
skip trim |
--aligner |
minimap2 , default: pbmm2 |
--pileup_method |
pbcpgtools , default: modkit |
--model |
parse --pileup-mode model to pb-CpG-tools, default: --pileup-mode count |
--bedgraph |
convert bed to bedgraph, compatible to methylScore input |
Samplesheet .csv
with header:
sample,modBam,ref,method
Column | Content |
---|---|
sample |
Name of the sample |
modBam |
Path to basecalled modBam file |
ref |
Path to assembly fasta file |
method |
specify ont / pacbio |
The outputs will be put into params.out
, defaulting to ./results
.
├── ont
│ │
│ ├── trim
│ │ ├── trimmed.fastq.gz
│ │ ├── trimmed.bam
│ │ └── trimmed.log
│ │
│ ├── repair
│ │ ├── repaired.bam
│ │ └── repaired.log
│ │
│ ├── alignment
│ │ ├── aligned.bam
│ │ ├── aligned.bai
│ │ ├── summary.txt
│ │ └── aligned.flagstat
│ │
│ ├── pileup/modkit
│ │ ├── pileup.bed
│ │ └── pileup.log
│ │
│ └── bedgraph
│ └── bedgraphs
│
│
└── pacbio
│
├── aligned_minimap2/ aligned_pbmm2
│ ├── aligned.bam
│ ├── aligned.bai/csi
│ └── aligned.flagstat
│
├── pileup: modkit/pb_cpg_tools
│ ├── pileup.bed
│ ├── pileup.log
│ └── pileup.bw (only pb_cpg_tools)
│
└── bedgraph
└── bedgraphs
Convert bedMethyl tables to bedgraphs (compatible with mehtylScore), filter out positions with <5x coverage.
bedgraph format:
column | name |
---|---|
1 | chrom |
2 | pos1 |
3 | pos2 |
4 | methylation percentage |
5 | modified base coverage |
6 | canonical base coverage |
- default pacbio pileup tool -> pbcpgtool (model)
- document which parameters & versions used for some tools
- minimap
- pileup settings (modkit & pbcpgtools)