-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to hard trim amplicon primer sequences #290
Comments
Hi @drpatelh, Several of the samples were difficult to sequence with very high Ct values, so we ended up trying a few different methods (ARTIC V4, 1200bp, FG) on Illumina as well as the 1200bp Freed et al (2020) method on GridION. Initial analysis suggested that the ARTIC V4 amplicon and Fusion Genomics (FG) probe-capture data were complementary with the FG data filling in a lot of gaps in coverage of the ARTIC and vice versa. So naturally, we wanted to merge the reads together while only trimming primers from the ARTIC amplicon reads. Originally, I wanted to merge the BAM files from separate viralrecon analyses on the ARTIC V4 reads with primer trimming and FG probe-capture reads without primer trimming, but unfortunately, I wouldn't have been able to use the merged BAM files as input for viralrecon, and extracting the BAM file reads to FASTQ with The Bash script below is basically what was done to each ARTIC V4 iVar trimmed BAM file from viralrecon: fgbio ClipBam -i 4662.ivar_trim.sorted.bam -o 4662.ivar_trim.sorted.fgbio-hardclip.bam -H -r MN908947.3.fa
samtools fastq -1 4662_1.fq -2 4662_2.fq -0 /dev/null -s 4662.single.fq 4662.sorted.bam
cat 4662_1.fq 4662_2.fq 4662.single.fq | gzip -c > 4662.fq.gz The fq.gz files were provided as input for viralrecon and viralrecon handled merging reads for the same samples. Out of curiosity, I tried running viralrecon on the FG reads with ARTIC primer trimming and found that 10% of the FG reads would have primers trimmed. So I couldn't just naively run all the sequence data through viralrecon with primer trimming because a significant amount of data would be trimmed away. We could have used CutAdapt to trim the primer sequences from the reads, but we haven't done enough testing with CutAdapt primer trimming and iVar has worked well for our primer trimming needs. Let me know if you have any other questions! Thanks for putting together such a great pipeline! Peter |
Hi @drpatelh, Thanks a lot for I recently found out I could have the use of such option to hard-clip primers and my case is a bit similar to @peterk87's except :
The little I investigated,
Hope this helps ! |
Hi @tetedange13 ! Thank you for using the pipeline and for your comments!
The implementation suggested by Peter above would suit your needs right? It removes soft-clipped regions from the iVar trimmed BAM files and then converts back to FastQ. |
Hi @drpatelh, Yup, for what I saw produced FASTQ are identical with both Just to precise Peter's implementation, I could not manage to make fgbio SortBam --input=sample.ivar_trim.sorted.bam --sort-order=QueryName --ouput=/dev/stdout |
fgbio ClipBam --input=/dev/stdin --upgrade-clipping --output=/dev/stdout --ref=/viralrecon/data/GCA_009858895.3_ASM985889v3_genomic.200409.fna |
samtools sort -n |
samtools fastq -c 9 -0 /dev/null -s /dev/null -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz Regarding performances, Thanks again ! |
Description of feature
Context in this Tweet. Used by @peterk87 @fmaguire in this pre-print.
Be great if you can add a little description as to why hard-clipping was preferred over soft-clipping and anything else that you found on your travels that would be useful to know for the implementation including the command you used :)
Methods
The text was updated successfully, but these errors were encountered: