EpiDiverse/snp is a bioinformatics analysis pipeline for calling single nucleotide polymorphism variants from bisulfite sequencing data and/or for clustering of eg. environmental plant samples according to their methylation profiles while masking the genomic variation.
The workflow pre-processes a collection of bam files from the EpiDiverse/WGBS pipeline using samtools, then masks genomic and/or bisulfite variation relative to the reference using custom scripts. Genomic masked alignments are then extracted into fastq format and tested for kmer diversity using kWIP for clustering groups. Bisulfite-masked alignments are taken forward for variant calling using a combination of Freebayes and post-call filtering with bcftools.
See the output documentation for more details of the results.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
-
Install
nextflow
-
Install one of
docker
,singularity
orconda
-
Start running your own analysis!
NXF_VER=20.07.1 nextflow run epidiverse/snp -profile <docker|singularity|conda> \
--input /path/to/wgbs/bam --reference /path/to/reference.fa
See the usage documentation for all of the available options when running the pipeline.
A minimal example dataset for testing purposes can be found in the EpiDiverse/datasets repository. You can either download the files manually and run the pipeline above as intended, or you can directly run the pipeline using the test profile option which will automatically download the data for you:
NXF_VER=20.07.1 nextflow run epidiverse/snp -profile test,<docker|singularity|conda>
The EpiDiverse/snp pipeline is part of the EpiDiverse Toolkit, a best practice suite of tools intended for the study of Ecological Plant Epigenetics. Links to general guidelines and pipeline-specific documentation can be found below:
- Installation
- Pipeline configuration
- Running the pipeline
- Understanding the results
- Runtime and memory usage guidelines
- Troubleshooting
These scripts were originally written for use by the EpiDiverse European Training Network, by Adam Nunn (@bio15anu).
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 764965
If you use epidiverse/snp for your analysis, please cite it using the following doi: