This repository is for raw data processing and downstream data analysis & visualization code in the "Spatial joint profiling of DNA methylome and transcriptome in mammalian tissues" manuscript.
Next Generation Sequencing (NGS) was performed using an Illumina NovaSeq 6000 sequencer (150bp paired-end mode). Read 1 contains the genome sequences, and Read 2 contains the spatial Barcode A & B and UMIs (mRNA).
The preprocessing pipeline for both spatial DNA methylation and spatial RNA data was built upon the Snakemake workflow management.
Change all the directories in the Snakefile to obtain RNA count matrices:
(1) Setup of directories and files: Automate the generation of directories to store each sample's raw and processed data.
: Use bbduk to filter sequences containing primers from the reads.
& filter_L2
: Apply additional filters to select reads with specific linker sequences.
: Extract spatial barcodes and UMIs and reformats data.
: Align reads to a reference genome (e.g. mm10) using STAR.
Change the config ID to the data ID number. To obtain BISCUIT QC results:
runSnakemake --config ID=SpMETSLE17DM ref=mm10 --snakefile /mnt/isilon/zhoulab/labpipelines/Snakefiles/20230602_SpatialMethSeq.smk biscuit_qc_all
To obtain CG levels:
runSnakemake --config ID=SpMETSLE17DM ref=mm10 --snakefile /mnt/isilon/zhoulab/labpipelines/Snakefiles/20230602_SpatialMethSeq.smk feature_mean_all`
To obtain CH levels:
runSnakemake --config ID=SpMETSLE17DM ref=mm10 --snakefile /mnt/isilon/zhoulab/labpipelines/Snakefiles/20230602_SpatialMethSeq.smk feature_mean_allc_all
: Trim the fastq files using
: Split all the reads based on the barcodes, obtain 2500 fastq files.
: Align reads to a reference genome (e.g. mm10) using BISCUIT.
: Identify all the CG and call the methylation at those sites.
: Quality check for alignment and methylation calling.
: Obtain average methylation over selected windows.
: Identify all the CH and call the methylation at those sites.
Identify the location of pixels on tissue from the brightfield image using
The data analysis and visualization were performed using R(4.4.0).
: Contains all the code to generate QC plots
: Contains all the code to integrate two modalities into Seurat objects, identify differential marker genes and VMRs, and visualize them on spatial maps
: Contains all the code to integrate between day 11 and day 13 mouse embryos’ data.
: Contains all functions used including mapping VMR to overlapping genes and iterative PCA.