Sequencing Facility ATAC-Seq pipeline (SF_ATAC-seq)

The ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) pipeline typically follows several steps to analyze paired-end sequencing data and identify regions of open chromatin (peaks). Here's a basic outline of the process:

Quality Control (QC):

FastQC: Perform initial quality checks on raw sequencing data to assess sequence quality, GC content, over-representation of sequences, etc.

Cutadapt: Remove low-quality bases, adapter sequences, or other artifacts that may affect downstream analysis.

Kraken2: Helps detect contamination by identifying unexpected organisms in the sample.

Fastq Screen: Screens sequencing data against a database of known contaminants, such as adapter sequences, PhiX control, and various other sources of contamination. It helps to identify and quantify the presence of these contaminants in the sequencing data.

Alignment:

Bowtie2: Aligns paired-end reads to the reference genome.

Post-alignment Processing:

Picard MarkDuplicates: Remove duplicate reads introduced during library preparation.

Peak Calling: Genrich identifies regions of open chromatin (peaks)

Deeptools: assesses the quality of peaks.

ChIPseeker: identifies the genomic regions associated with open chromatin regions and to perform functional annotation of these regions.

MultiQC : generates an interactive HTML report that provides a concise summary of the results

Usage

Step 1: Obtain a Copy of the Workflow

Clone the Repository: Clone the new repository to your local machine, choosing the directory where you want to perform data analysis. Instructions for cloning can be found here.

Step 2: Configure the Workflow

Tailor the workflow to your project's requirements:

Edit config.yaml in the config/ directory to set up the workflow execution parameters.

Step 3: Load the snakemake version 8 or above

module load snakemake/8.4.8

Step 4: Create a conda environmet

conda create -n $NAME

Step 5: Execute the Workflow

Activate the Conda Environment:

conda activate $NAME

Install mamba

conda install -c conda-forge mamba

Test the Configuration: Perform a dry-run to validate your setup:

snakemake --use-conda -np

Local Execution: Execute the workflow on your local machine using $N cores:

snakemake --use-conda --cores $N

Here, $N represents the number of cores you wish to allocate for the workflow.

Contact

CCRSF_IFX@nih.gov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Sequencing Facility ATAC-Seq pipeline (SF_ATAC-seq)

Quality Control (QC):

Alignment:

Post-alignment Processing:

Usage

Step 1: Obtain a Copy of the Workflow

Step 2: Configure the Workflow

Step 3: Load the snakemake version 8 or above

Step 4: Create a conda environmet

Step 5: Execute the Workflow

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Sequencing Facility ATAC-Seq pipeline (SF_ATAC-seq)

Quality Control (QC):

Alignment:

Post-alignment Processing:

Usage

Step 1: Obtain a Copy of the Workflow

Step 2: Configure the Workflow

Step 3: Load the snakemake version 8 or above

Step 4: Create a conda environmet

Step 5: Execute the Workflow

Contact