-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Choose Kraken2-filtered host #350
Comments
Hi @tetedange13 ! This is an interesting approach. So if I understand correctly, rather than creating a subsetted Kraken database we would provide a full-fledged one and provide the taxonomy id of a genome in the database instead? Could you provide some more information (or even a PR would be better?!) that shows how we could add this to the pipeline? For now, it doesn't have to be an nf-core module - we can add that later. |
Hi @drpatelh, No need to provide a full-fledged Kraken index, you can completely keep going with your subsetted one This way Command to run Kraken : kraken2 \
--db kraken2_human \
--threads 8 \
--unclassified-out ${prefix}.unclassified#.fastq \
--classified-out ${prefix}.classified#.fastq \
--report ${prefix}.kraken2.report.txt \
--gzip-compressed \
--paired \
--report-zero-counts \
${prefix}_1.trim.fastq.gz ${prefix}_2.trim.fastq.gz Would become : kraken2 \
--db kraken2_human \
--threads 8 \
--output ${prefix}.classifiedreads.kraken2.txt
--report ${prefix}.kraken2.report.txt \
--gzip-compressed \
--paired \
--report-zero-counts \
${prefix}_1.trim.fastq.gz ${prefix}_2.trim.fastq.gz => Instead of outputting Then would use extract_kraken_reads.py \
-k ${prefix}.classifiedreads.kraken2.txt\
--report ${prefix}.kraken2.report.txt \
-s1 ${prefix}_1.trim.fastq.gz -s2 {prefix}_2.trim.fastq.gz \
--taxid 9606 \
--exclude \
-o {prefix}.unclassified_1.fastq.gz -o2 {prefix}.unclassified_2.fastq.gz => In my example I kept current filename convention |
I am sorry but I do not know Nextflow enough to be able to submit a PR
=> And expected output is simply a new pair of FASTQ (if paired-end), possibly called |
Description of feature
Hi,
This is probably related to issue #113 (comment)
We are trying to shift from a detection of solely Sars-cov-2 (using Artic primers) to a more "metagenomics" approach (using Illumina RVOP kit)
=> Looks like we can stick with
viralrecon
for that, so first : thanks for this wonderful tool !Currently filtering of host reads with Kraken is based on keeping only "unassigned" reads after classification against a Kraken DB composed of Human only (deduced from here)
Would be great if we could run Kraken against a more diverse index (like "Standard" one) through
--kraken2_db
, then filter out host reads by specifying host taxid (new parameter such as--kraken_host_taxid
)=> This way we could keep filtering host reads, but also use Kraken at its full potential (to have a rapid global look of what's inside our sample)
It could be based on
extract_kraken_reads.py
from KrakenTools suite=> But I guess it will require to add this script as an
nf-core
module first (or maybe I missed it ?)Thanks again !
Kind regards,
Felix.
The text was updated successfully, but these errors were encountered: