EnviHoli

Code repository for sedimentary ancient DNA (sedaDNA) shotgun sequencing (metagenomics) data analysis.
Contributed by: Sisi Liu/Sisi Liu, Lars Harms, Christiane Böckel, Kathleen R. Stoof-Leichsenring

General content

Scripts and Files in This Repository

Bash Scripts

This repository contains 10 bash scripts located in the bash_scripts directory. These scripts are:

00-bowtie2-build-bac0.sh
01-fastqc-clumpify-fastp-dedupe.sh
02-bowtie2-a1.sh
03-bowtie2-a2.sh
04-bowtie2-a3.sh
05-merge-sort.sh
06-metaDMG.sh
07-post-metadmg-lca.sh
08-combine_lca.sh
09-mismatch.sh
10-combine_dmgout.sh

External Scripts

In addition, there are 5 external scripts in the external_scripts directory, which are written in various languages (Python and R) and are not part of the bash scripts.

combine_dmgout.R
combine_lca.R
dedup_sam.py
mismatch.R
post-metadmg-lca.R

Files

In addition, there are 10 files in the external_files directory, which are descriptions of sources of raw shotgun sequencing data (raw_shotgun_data_sources.txt), taxonomic reference data (taxonomic_reference_database.xlsx), and age-depth models of 8 lake cores (e.g., Age-depth_*_shotgun.csv)

Detailed Description

This section provides an in-depth look at the data analysis's features and functionality.

Installation

Before running the scripts, make sure to install the required dependencies.

Dependencies' manuals

Instructions on how to use the dependencies.

Usage

I. Shotgun sequencing data quality check -> deduplication -> adapter trimming and merging of paired-end reads in parallel -> deduplication -> quality check

Input raw sequencing paired end fastq files: there are two files, ${FILEBASE}.R1.fastq.gz and ${FILEBASE}.R2.fastq.gz (or ${FILEBASE}_R1.fastq.gz and ${FILEBASE}_R2.fastq.gz, depending on sequencing company), for each sequencing id ${FILEBASE}.
Script: bash_scripts/01-fastqc-clumpify-fastp-dedupe.sh
Output for next step (alignment): *fastp_dedupe_merged.fq.gz

II. Taxonomic reference database establishment and end-to-end alignment in Bowtie2

Source data for taxonomic reference database establishment: external_files/taxonomic_reference_sources.txt
Script for taxonomic reference database establishment: bash_scripts/00-bowtie2-build-bac0.sh (bowtie2-build for Bacteria refseq database establishmen. Other database using the same script with different path-to-db and splited size)
Input merged shotgun sequencing data: *fastp_dedupe_merged.fq.gz
Script for alignment against taxonomic reference database: bash_scripts/02-bowtie2-a1.sh, bash_scripts/03-bowtie2-a2.sh, bash_scripts/04-bowtie2-a3.sh
Output for next step (merge and sort alignments): ${FILEBASE}.$(basename $DB).bam (${FILEBASE} is fastq file id; $(basename $DB) is taxonomic reference database name. In total, there are 147 alignment bam files per seqencing file.)

III. Merge and sort alignments

Motivation: To make sure alignments have been sorted by readID; sort the sam file instead of bam file due to size of headers of merged bam file > 2GB.

Input all alignments:${FILEBASE}.$(basename $DB).bam
Script for merge and sort: bash_scripts/05-merge-sort.sh
Outoup for next step (taxonomic classification and ancient damage pattern analysis):${FILEBASE}_L30.sorted.sam.gz

IV. Taxonomic classification and ancient damage pattern analysis

Taxonomic profile: Wang et al., 2022; Ancient pattern: Michelsen et al., 2022

Input sorted alignments:${FILEBASE}_L30.sorted.sam.gz
Script: bash_scripts/06-metaDMG.sh
Output structure: see Michelsen et al., 2022

V. Post-processing of MetaDMG

Attach full lineage and key ranks based on tax_id: bash_scripts/07-post-metadmg-lca.sh and external_scripts/post-metadmg-lca.R
Combine taxonomic classification results: bash_scripts/08-combine_lca.sh and external_scripts/combine_lca.R
Calculate ATCG substitutions frequency and attach lineage information: bash_scripts/09-mismatch.sh and external_scripts/mismatch.R
Combine C>T rate of metaDMGout.csv and attach lineage information: bash_scripts/10-combine_dmgout.sh and external_scripts/combine_dmgout.R

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
bash_scripts		bash_scripts
external_files		external_files
external_scripts		external_scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EnviHoli

General content

Scripts and Files in This Repository

Bash Scripts

External Scripts

Files

Detailed Description

Installation

Dependencies' manuals

Usage

I. Shotgun sequencing data quality check -> deduplication -> adapter trimming and merging of paired-end reads in parallel -> deduplication -> quality check

II. Taxonomic reference database establishment and end-to-end alignment in Bowtie2

III. Merge and sort alignments

IV. Taxonomic classification and ancient damage pattern analysis

V. Post-processing of MetaDMG

About

Releases

Packages

Languages

sisiliu-research/EnviHoli

Folders and files

Latest commit

History

Repository files navigation

EnviHoli

General content

Scripts and Files in This Repository

Bash Scripts

External Scripts

Files

Detailed Description

Installation

Dependencies' manuals

Usage

I. Shotgun sequencing data quality check -> deduplication -> adapter trimming and merging of paired-end reads in parallel -> deduplication -> quality check

II. Taxonomic reference database establishment and end-to-end alignment in Bowtie2

III. Merge and sort alignments

IV. Taxonomic classification and ancient damage pattern analysis

V. Post-processing of MetaDMG

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages