This concept can be applied to but not limited to ATAC-seq, ChIP-seq, Cut&Tag
Here are the major steps and the example corresponding bioinformatic tools for differential analysis incorporating the step of copy number normalization for ATACseq data and ChIPseq data in samples with different karyotype/copy numbers. Users are free to choose alternative tools.
Raw reads to fastq: bcl2fastq
Read alignment: bwa
Read filtering: picard
, samtools
, bedtools
Peak calling: macs2
Signal quantification: htseq
Data normalization and differential analysis: DESeq2
, DiffBind
These steps can be run separately and integrated into differential analysis pipelines for other count-based functional genomic assays
- Calling local relative copy number ratio:
CNVkit
Input data: genomic sequencing data or ChIP-seq input data copy number ratio (CNR) = Copy_number_perturbed_sample/Copy_number_control_sample
e.g. in Down Syndrome (trisomy 21), compared to a euploid sample, the CNR for chr21 is 3/2=1.5; the CNR for other regions is 2/2=1; if there is a relative copy number loss, the CNR will be <1.
- Assigning peaks to DNA segments:
bedtools closest
and modify the read/fragment count in peaks matrix using the CNR as a scaling factor For peaks with CNR>1: divide the read/fragment count in perturbed_sample by CNR; otherwise multiply the read/fragment count in perturbed_sample by CNR. This is to avoid inflating the statistical power of detecting differential signals