The ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) pipeline typically follows several steps to analyze paired-end sequencing data and identify regions of open chromatin (peaks). Here's a basic outline of the process:
FastQC: Perform initial quality checks on raw sequencing data to assess sequence quality, GC content, over-representation of sequences, etc.
Cutadapt: Remove low-quality bases, adapter sequences, or other artifacts that may affect downstream analysis.
Kraken2: Helps detect contamination by identifying unexpected organisms in the sample.
Fastq Screen: Screens sequencing data against a database of known contaminants, such as adapter sequences, PhiX control, and various other sources of contamination. It helps to identify and quantify the presence of these contaminants in the sequencing data.
Bowtie2: Aligns paired-end reads to the reference genome.
Picard MarkDuplicates: Remove duplicate reads introduced during library preparation.
Peak Calling: Genrich identifies regions of open chromatin (peaks)
Deeptools: assesses the quality of peaks.
ChIPseeker: identifies the genomic regions associated with open chromatin regions and to perform functional annotation of these regions.
MultiQC : generates an interactive HTML report that provides a concise summary of the results
Clone the Repository: Clone the new repository to your local machine, choosing the directory where you want to perform data analysis. Instructions for cloning can be found here.
Tailor the workflow to your project's requirements:
Edit config.yaml
in the config/
directory to set up the workflow execution parameters.
module load snakemake/8.4.8
conda create -n $NAME
Activate the Conda Environment:
conda activate $NAME
Install mamba
conda install -c conda-forge mamba
Test the Configuration: Perform a dry-run to validate your setup:
snakemake --use-conda -np
Local Execution: Execute the workflow on your local machine using $N cores:
snakemake --use-conda --cores $N
Here, $N represents the number of cores you wish to allocate for the workflow.