Skip to content

Commit

Permalink
Creating repository, initial state
Browse files Browse the repository at this point in the history
  • Loading branch information
kjurdzinski committed May 21, 2023
0 parents commit c2809ba
Show file tree
Hide file tree
Showing 30 changed files with 1,230,876 additions and 0 deletions.
Empty file added OTUs_16S/OTUs.txt
Empty file.
Empty file added OTUs_16S/OTUs_membership.txt
Empty file.
798 changes: 798 additions & 0 deletions OTUs_16S/log.txt

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions SLURM_chimera_dbOTU.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash -l

#SBATCH -A naiss2023-5-110
#SBATCH -p core
#SBATCH -n 20
#SBATCH --mail-type=END
#SBATCH [email protected]
#SBATCH -t 06:00:00
#SBATCH -J chimera_dbOTU_clustering

cd /crex/proj/snic2020-6-126/projects/plankton_monitoring/P20310/ASV_reannotation
conda activate ASV-clustering
bash workflow_all.sh 16S
27 changes: 27 additions & 0 deletions Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import pandas as pd
from snakemake.utils import min_version
min_version("5.3.0")
# configfile: "config.yml"

rule sort_ASVs:
"""
Sort ASVs by abundance
"""
input:
"data/asv_seqs.fasta"
"data/asv_counts.tsv"
output:
"temporary/asv_seqs_sorted.fasta"
shell:
"""
if [ ! -d "chimeras" ]
then
mkdir chimeras
fi
if [ ! -d "temporary" ]
then
mkdir temporary
fi
python3 scripts/sort_ASVs_by_abundance.py data/asv_seqs.fasta \
-c data/asv_counts.tsv -o temporary/asv_seqs_sorted.fasta
"""
17 changes: 17 additions & 0 deletions chimera_ASVs_workflow.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
if [ ! -d "chimeras" ]
then
mkdir chimeras
fi
if [ ! -d "temporary" ]
then
mkdir temporary
fi
python3 scripts/sort_ASVs_by_abundance.py data/asv_seqs.fasta \
-c data/asv_counts.tsv -o temporary/asv_seqs_sorted.fasta
vsearch --threads 6 --dn 1.4 --mindiffs 3 \
--mindiv 0.8 --minh 0.28 --abskew 2.0 \
--chimeras chimeras/chimeras.fasta --borderline chimeras/borderline.fasta \
--nonchimeras chimeras/nonchimeras.fasta --uchimealns chimeras/uchimealns.out \
--uchimeout chimeras/uchimeout.txt --uchimealns chimeras/uchimealns.txt \
--uchime_denovo temporary/asv_seqs_sorted.fasta \
>logs/log_uchime.txt 2>&1
3,847 changes: 3,847 additions & 0 deletions chimeras_16S/borderline.fasta

Large diffs are not rendered by default.

14,289 changes: 14,289 additions & 0 deletions chimeras_16S/chimeras.fasta

Large diffs are not rendered by default.

262,578 changes: 262,578 additions & 0 deletions chimeras_16S/nonchimeras.fasta

Large diffs are not rendered by default.

262,578 changes: 262,578 additions & 0 deletions chimeras_16S/nonchimeras_clean.fasta

Large diffs are not rendered by default.

102,114 changes: 102,114 additions & 0 deletions chimeras_16S/uchimealns.txt

Large diffs are not rendered by default.

40,369 changes: 40,369 additions & 0 deletions chimeras_16S/uchimeout.txt

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions clean.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
rm -r temporary_$1/
rm -r chimeras_$1/
rm -r OTUs_$1/
Empty file added config.yml
Empty file.
87 changes: 87 additions & 0 deletions config_old.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Update with the name of a subdirectory directory under data/ that contains
# asv_seqs.fasta, asv_counts.tsv and asv_taxa.tsv
rundir: "16S"
# Specify a name of the run. Using different run_name you can run clustering
# with different parameters using the same vsearch output.
run_name: "default"
# At what rank should the data be split prior to aligning and clustering
split_rank: "Family"
# Method to use for calculating representative sequences for clusters
# select between "median", "mean" and "sum"
rep_method: "median"

chimera:
run_name: "chimera1"
# Should chimera detection and removal be performed on the input data?
remove_chimeras: True
# Select method for chimera filtering:
#'batchwise' = run chimera detection on dataset as a whole.
#'samplewise' = split input fasta into one file per sample and run chimera
# detection on each sample individually.
method: "batchwise"
# Select algorithm to use, you can choose from 'uchime_denovo', 'uchime2_denovo'
# and 'uchime3_denovo'.
algorithm: "uchime_denovo"
# In batchwise method, require that a sequence marked as chimeric is present
# with its parents in at least <min_samples_shared> samples
min_samples_shared: 1
# In batchwise method, require that a sequence marked as chimeric is present
# with its parents in at least <min_frac_samples_shared> fraction of samples
min_frac_samples_shared: 0.5
# In samplewise method, require that a sequence is marked as chimeric in at least
# <min_chimeric_samples> in order for it to be removed from analysis. If this
# value is set to 0, ASVs have to be marked as chimeric in all samples
min_chimeric_samples: 0
# Chimera detection settings
dn: 1.4
mindiffs: 3
mindiv: 0.8
minh: 0.28

# Specify what clustering software to use. Choose from 'swarm', 'opticlust',
# 'dbotu3' and 'lulu'.
software:
# - "swarm"
# - "opticlust"
# - "lulu"
- "dbotu3"
evaluation_rank: "Species"

# Tool-specific settings
vsearch:
threads: 10
id: 0.84
iddef: "1"
query_cov: 0.9

opticlust:
# For opticlust, choose whether pairwise alignments should be generated with
# 'vsearch' or 'mothur'
aligner: "vsearch"
delta: 0.0001
cutoff: 0.03
initialize: "singleton"
precision: 1000
threads: 10

swarm:
differences: 1
no-otu-breaking: False
fastidious: True
boundary: 3
threads: 10
match-reward: 5
mismatch-penalty: 4
gap-opening-penalty: 12
gap-extension-penalty: 4

dbotu3:
dist: 0.1
abund: 10.0
pval: 0.0005

lulu:
minimum_ratio_type: "min"
minimum_ratio: 1
minimum_match: 84
minimum_relative_cooccurence: 0.95
40,370 changes: 40,370 additions & 0 deletions data_16S/asv_counts.tsv

Large diffs are not rendered by default.

Loading

0 comments on commit c2809ba

Please sign in to comment.