Singularity (v. 3) and NextFlow (>= v. 20.10.0). Containers with the software for each step are pulled from the Sylabs cloud library (https://cloud.sylabs.io/library).
Paths to various generic files (e.g., bwa indices) must be included in the nextflow.config file -- check that file and change paths accordingly. These include:
- Blacklist bed files for each genome
- Chrom size files for each genome
- bwa indices (compatible with bwa v. 0.7.15)
- TSS files (BED6 files denoting TSS positions)
When launching the pipeline, as shown in the nextflow
command below, you'll also need to set the following:
- The location of the results directory (e.g.,
--results /path/to/results
) - The location of the barcode whitelist (e.g.,
--barcode-whitelist /path/to/737K-arc-v1.txt
). The 10X ATAC v1 (737K-cratac-v1.txt) and 10X ATAC multiome (737K-arc-v1.txt) whitelists are included in this repo.
You can split the fastq files into chunks using the --chunks parameter (default: 1, meaning no chunking). In the case of very large fastq files this can speed up processing.
You can generate output plots of the (pseudobulk ATAC) signal at gene TSS by adding gene names to the params.plot_signal_at_genes variable (these gene names must be present in the TSS files). By default only the signal at the GAPDH TSS is plotted.
Lastly, you'll need to include information about each ATAC-seq library, including the genome(s) for the species that each library includes, and the paths to the fastq files for each readgroup. Organize this information in a JSON file, as in library-config.json. Note that for each readgroup, three fastq files are required -- the first and second insert reads ('1' and '2'), and the read with the nuclear barcode ('index')
Once you have all of the above information, you can run the pipeline as follows (in this case, indicating the path to the results on the command line):
nextflow run -resume -params-file library-config.json --barcode-whitelist /path/to/737K-arc-v1.txt --results /path/to/results /path/to/main.nf
ataqv/bulk/*.{json.gz,out}
: Pseudobulk ataqv output for each libraryataqv/bulk/ataqv-viewer-{genome}
: Pseudobulk ataqv HTML reportsataqv/single-nucleus/*.png
: Plots of per-barcode ataqv metricsataqv/single-nucleus/*.txt.gz
: Per barcode ataqv metrics in txt format, as output by ataqvataqv/single-nucleus/*.txt
: Per barcode ataqv metrics in txt format as output by ataqv, plus some additional metricsataqv/single-nucleus/*.suggested-thresholds.tsv
: Suggested min HQAA threshold for the library, based on multi-otsu thresholding of the HQAA distributionbigwig/*.bw
: Pseudobulk bigwig files for each librarybigwig/plot/*.png
: Pseudobulk ATAC signal at gene TSS for selected genescounts/*
: Peak count matrix (based on pseudobulk peaks)fastqc/*
: QC of raw sequencing readsfragment-file/*
: Tabixed fragment file generated with sintomacs2/*
: Pseudobulk peak calling results for each librarymark_duplicates/*
: Unfiltered BAM files, with duplicates markedmultiqc/*
: multiqc summaries of fastqc results, before and after adapter trimmingplot-barcodes-matching-whitelist
: Plot displaying percentage of barcodes matching barcode whitelistprune/*
: Filtered bam files