Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastcat freaks out in the wf-16s workflow when using a sample sheet/dorado to demultiplex #9

Open
billytcl opened this issue Jan 19, 2025 · 1 comment

Comments

@billytcl
Copy link

I'm using a sample sheet in dorado (v0.9.0) to demux a run, and then feeding it into wf-16s. It's giving me a nonsensical fastcat error:

ERROR ~ Error executing process > 'fastcat (1)'

Caused by:
  Process `fastcat (1)` terminated with an error exit status (1)


Command executed:

  mkdir fastcat_stats
  mkdir fastq_chunks
  
  # Save file as compressed fastq
  fastcat         -s barcode001         -f fastcat_stats/per-file-stats.tsv         -i fastcat_stats/per-file-runids.tsv         -l fastcat_stats/per-file-basecallers.tsv         --histograms histograms                           <( 
              samtools cat -b <(find . -name 'input_src*') |                 samtools fastq - -n -T '*' -o - -0 - 
            )     | if [ "0" = "0" ]; then
      bgzip -@ 4 > fastq_chunks/seqs.fastq.gz
    else
      split -l null -d --additional-suffix=.fastq.gz --filter='bgzip -@ 4 > $FILE' - fastq_chunks/seqs_;
    fi
  
  mv histograms/* fastcat_stats
  
  # get n_seqs from per-file stats - need to sum them up
  awk 'NR==1{for (i=1; i<=NF; i++) {ix[$i] = i}} NR>1 {c+=$ix["n_seqs"]} END{print c}'         fastcat_stats/per-file-stats.tsv > fastcat_stats/n_seqs
  # get unique run IDs (we add `-F '\t'` as `awk` uses any stretch of whitespace
  # as field delimiter per default and thus ignores empty columns)
  awk -F '\t' '
      NR==1 {for (i=1; i<=NF; i++) {ix[$i] = i}}
      # only print run_id if present
      NR>1 && $ix["run_id"] != "" {print $ix["run_id"]}
  ' fastcat_stats/per-file-runids.tsv | sort | uniq > fastcat_stats/run_ids
  # get unique basecall models
  awk -F '\t' '
      NR==1 {for (i=1; i<=NF; i++) {ix[$i] = i}}
      # only print basecall model if present
      NR>1 && $ix["basecaller"] != "" {print $ix["basecaller"]}
  ' fastcat_stats/per-file-basecallers.tsv | sort | uniq > fastcat_stats/basecallers

Command exit status:
  1

Command output:
  (empty)

Command error:
  **ERROR: Read's barcode number (2609) is greater than MAX_BARCODES (1025)**

Work dir:
  /mnt/ix1/Projects/M103_250118_nanopore16S/nextflow/work/5f/1e0e25c7d320b953140ae0430068f7

Container:
  /mnt/ix1/Projects/M103_250118_nanopore16S/nextflow/work/singularity/ontresearch-wf-common-shabadd33adae761be6f2d59c6ecfb44b19cf472cfc.img

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

The barcode number that it's referencing is not an actual barcode number but the last four digits of our sample ID (eg. PNNNN_NNNNN). At first I thought it was because fastcat is getting confused because of my directory structure (eg samples/reads.bam for wf-16s), or that its soft link had that identifier in there. But after eliminating all of those options, I discovered that the culprit is:

8d9a5f9a-1c1c-4767-9435-ef1200bc2207    4       *       0       0       *       *       0       0       ATGTTCCTGTACTTCGTTCAGTTACGTATTGCTGGTGCTGCTACTTACGAAGCTGAGGGACTGCTTAACCTTTCTGTTGGTGCTGATATTGCAGAGTTTGATCCTGGCTCAGATTGAACGCT
GGCGGCAGGCTTAACACATGCAAGTCGAACGGTAGCAGGAGAAAGCTTGCTCTCTTGCTGACGAGTGGCGGACGGGTGAGTAATGCTTGGGAATCTGGCTTATGAGGAGGATAACGACGGGAAACTGTCGCTAATACCGCGTATTATCGGAAGATGAAAGTGCGGGACTGAGAGGCCGCATGCCATAGGATGAGCCCAAGTGGGATTAGGTAGTTGGTGGGGTAAA
GGCCTACCAAGCCTGCGATCTCTAGCTGGTCTGAGAGGATGACCAGCCACGCTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCGCAATGGGGGGAACCCTGACGCAGCCATGCCGCGTGAATGAAGAAGGCCTTAGGGTTGTAAAGTTCTTTCGGTATCGAGGAAGGTTGATGTGTTAATAGCACATCAAATTGACGTTAAAT
ACGGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTAATCGGAATAACTGGGCGTAAAGGGCACGCGGGCGGTTATTTAAGTGAGGTGTGAAAGCCCTGGGCTTAACCTAGGAATTGCATTTCAGACTGGGTAACTAGAGTACTTTAGGGAGGGGTAGAATTCCACGTGTAGCGGCGAAATGCGTAGAGATGTGGAGGAA
TACCGAAGGCGAAGGCAGCCCCTTGGGAATGTACTGACGCTCATGTGCGAAAGCGTGGGGAGCAAACAGGATTAGGTACCCTGGTAGTCCACGCTGTAAACGCTGTCGATTTGGGGGTTGGGGTTTAACTCTGGCGCCCGTAGCTAACGTGATAAATCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTG
GAGCACGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTACTCTTGACATCCTAAGAAGAGCTCAGAGATGAGCTTGTGCCTTCGGGAACTTAGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGATTAGGTCGGGAACTCAGAGGAGACTGCCAGTGATA
AACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGAGTAGGGCTACACACGTGCTACAATGGCGTATACAGAGGGAAGCGAAGCTGCGAGGTGGAGCGAATCTCATAAAGTACGTCTAGGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGCGAATCAGAATGTCGCGGTGAATACGTTCCCGGGCCTTGTAC
ACACCGCCCGTCACACCATGGGAGTGGGTTGTACCAGAAGTAGATAGCTTAACCTTTTGGAGGGCGTTTACCACGGTATGATTCATGGCTGGGGTGAAGTCGTAACAAGGTAACCGGAAGATAGAGCGACAGGCAAGTAGGTTAAGCAGTCCCTCAGCTTCGTAAGTAGCAGCACCAGCAAT  $$$$$%%%*-***,)+,01=EDC@=999:;=66666FEIEEF
IGFHFGFEFGHIHJSSSSIGSKMSSSSSSSSSSSSSLIJMSQNKHJHGSFIFFDCB44444KSHE::GFDBEEDEFGLMKNQSPSSHHKLMNSLIIIFSJSSSSSMQSLIJKNIKGC>??@NSSSSSSOSSSSSSSRSSQSSSSSSPNKSSSSKIILSIFEFGSMSMJFDCCNJSSLJFGIOJ0*)(&&''(++++,,,;>@SNSSKDGFGSHKSLSSLSMSKSSM
IJSSSSSKHHGISSSPSPSSSSSOSSSSSSSMSOSSNKSSNSSSSOSSSSSSSOLHSSSMMMSSSSSSROSSSSSSSSSSSSSB>>>>LLPNJISSMSSNSSSOSSSRSSOJSSSSSSMJIJKINMSSLHFGGDGSSMLSSSSPSSSSSNSK>MLSLOSSSOSSSSSSPKJGFEFHKSSMJSJIJIHFISSQNJPPSSSOSSSSSOSSGKKSRSSSSNSKSSSSQG
IIHOSSSSMJSIGJSSSLSMSSSSSSSSSIGJRKSMLSSSSNMSSSSSSSSSSSRLISSSNSSSRSSSSSPSSMISNQQNSSLSLJNNSSSHFGHHJKJSSSMSPSMLKKLSSSSSOSOSSSLHJISSSSSOHPSKLONMSSMSSPSSSSSKSJIISJFLSQSSSSSSJMSFHQSSSSKHLMMNLSRKPOIMSLSHIIG<;;:9444:<SOSOJILSSSOSSKSSS
SSSSSKKSHKMSDAAB878KSMHKIIJNSJIMSSSSOMMSSSSSLSKSSSONKIIHJNKKFGJEDBCEDDFKSMSSSSSSSMSSOSSSSSPNSSSIPOQSSSOEBBBGGCFFEFHOSSNMJIMSKSMSNSSSLISSSSSSSSKAEJKLSKLJE::DDCDIGGQSSQKISSPSQLIKSISJGMSSLSLFFFFFHMJFJLNEEB??AA=87566679>@A@CCCCFGS
SSOFJHGEFGHIKSSMKIGGFB@DDFKMJSQPSQSLKKKKKSSSSSSSLLSSQSSSSPSSNMSSSPOKQSRSRSLIMSSNSNKLMSHH@CHGSOKSSNLMJSSSSQMSPONIMSSSSSOSMJQMSNOKMLSMSKHE**)(**,(()?=>AAJSJMSSNKSSSSSSSSSSSSOJSSHLFFHJGSSSSSSSSQSSSOKKHMPSSSSSSSSSSSSMSOSSSQQIHKSSF
INSSSSSSSSSPSSPSSSSSSSSSMMSSSQSSSSSOJKSSMNSOSKLMSSSJQSSSSJIHGFFJSSSSMJIGGHGGISSLSSSSHSSSSSSSSSSSSSSSOSSPKKORSSSSSKNSSSSSSSSPSMSSSNSSSSSNSPSSSNSSLISSIGFIGSMSIFGGEFFFFSSSJGNSSSSGGGHFSSSOSSSSPSSSSSSHIGFFMSQSSSQSLSISSSPSSLIJMSMJPS
SSSGGGHGSSSPSQSSSSSSNSSSSSSSSPNSSSSSSSSJSFGISSSSSJSJNSNSSSSNKSSJCCSJPHJSSSJISN?77BCPLNNSSSSNSSNSSSSPSMISSLOSSSSSQOKIFFFECAABDCFGKHNSSSFHLSGSSSGHGFBAB?BHHFD>>=====A?>>>AAKPSSSSSQSSSPOSNSSSSSSSQSPMMSSSSSSLLFSSLPMIFGHHHSSSSSSQSSS
IFIQGFSSBAAA?EDCDEFSSSNK>;;2210//+++    **BC:Z:P6647_22609**        qs:f:25.202     du:f:4.9112     ns:i:24556      ts:i:10 mx:i:4  ch:i:1402       st:Z:2024-11-08T18:13:28.190+00:00      rn:i:121271     fn:Z:PAY05180_6a31b7a5_6d0
4f03f_43.pod5   sm:f:728.842    sd:f:125.658    sv:Z:pa dx:i:0  RG:Z:6d04f03f1d80862f6d7b82d9fbe1ab199895dcfa_dna_r10.4.1_e8.2_400bps_sup@v5.0.0_EXP-PBC096_barcode35
eb45aa5b-5d3d-4967-a14d-4d4abc883dd5    4       *       0       0       *       *       0       0       CTATGTCTGCGCTCGTTGCTGACGACGTTACGTATTCCTGGTGCTGCTACTTACGAAGCTGAGGGACTGCTTAACCTTTCTGTTGGTGCTGATATTGCAGAGTTTGATCCTGGCTCAGAACA
AACACTGGCGGCATGCCTAACACATGCAAGTCGAACGAGCCCTTCGGGGTTAGTGGCACACGGGTGCGTAACACGTGGGAATCTGCCCTTGGGTTCGGAATAACTCGCCGAAAGGCGTGCTAATACCGGATGATGTCGAAAGACCAAAGATTTATCGCCCAAGGATGAGCCCGCGTAAGATTAGGTAGTTGGTGAGGTAAAGGCTCACCAAGCCGACGATCTTTAG
CTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGGCCGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGAGCAAAGCCTGATCCAGCAATGCCGCGTGAGTGATGAAGGCCTTAGGGTTGTAAAGCTCTTTTACCCGGGATGATAATGACAGTACCGGGAGAATAAGTTCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGA
GCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCACGTAGGCGGCTTTGTAAGTCAGAGGTGAAAGCTGGAGCTCAACTCCAGAACTGCCTTTGAGACTGCATCGCCTGAATCCAGGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGGCTCACTGGACTGGTATTGACGCTGAGGTGCGAAAGC
GTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGATAACTAGCTGTCCGGGCACTTGGTGCTTGGGTGGCGCAGCTAACGCATTAAGTTATCCGCCTGGGGAGTATGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGGCCTGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGCAGAACCTTACCAGCGTTTGACA
TGTCCGGACGATTTCCAGAGATGGATCTCTTCCCCCTCGGGGACTGGAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGCCTTTTAACCCCATCATTTAGTTGGGGACTCTAAAGGAACCGCCGGTGATAAGCCGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTACGTGCTG
GGCTACACACGTGCATCAATGGCGGTGACAGTGGGCAGCAAACTCGCGAGAGTGCGCTAATCTCCAAAAGCCGTCTCAGTTCGGATTGTTCTCTGCAACTCGAGCCCATGAAGTTGGAATCGCTAGTAATCGTGGATCAGCACGCCACGGTGGAAACTCGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTTGGTTTTACCTGAAGACGGTGCGCT
AACCGAAAGGGGGCAGCCGGCCACGGTAGGGTCAGCGACTGGGGTGAAGTCGTAACAAGGTAACCCGAAGATAGAGCGACAGGCAAGTAGGTTAAGCAGTCCCTCAGCTTCGTAAGTAGCAGCACCAGCAATACGTA       %%%&'*)(&%%%&%%%%%&'''('&&&&&&)+559955667ADG99<EDCCEGGGEEEBB999>FHILLSH8557O??D;;D
CCJKLJSSLPLJLSFCHSKISIIJJSSNJSSJKHII>>=0....0//032332237:@665565542330/-+++,())*2::E::BDDKSMJIJSNHSHHSSSSSSJKSLS9:99:HHHGHEEFCB=8=88<>D>??>=<;::8666>@DDSSSSSSPSSJHFKMLSKNJSSSLHMLSSSSSSSECCCDNNOSSKLSSSSSSSOLGKJHKLDCCCOSNJOKDDDE
KDCCHGSNSPIGSSSSSSSOLSSK999999D;;::;JD5:;;IGPE@JB;BCBNGIILHFJMGGSSQIC:7421,((('))()()(***.<<HJKMSSSOLMKSMSSLOSSSKHKOLNIJJCDB53334555AIF222.....78=<=A@AKIDCCCEGHKLSSSMIPJGGIGGSSSNSLLSSIMJKECCBB9DILSNSLSMISSS@>?<:?;;<ESRSSGFI---
--<SSOSLISSNSSKIDEDFHIMGMD@@89999CCCFPSKJOH975210(((((89<KQSSQFGFIQSOSSLIFFFGFSISMINLIDDD@@S===9;;;;;FGKIF>6+++3=BC<::::9@?CHGGGSSICFF-???HHMSKJ====>SRKMLSMPSSLCCDOG???7444372223555:BFGFIHEJSSSSJMKGFGDSMS;:::;DEKJPLIQSSSSSNSSS
MMLJLSS???>>DDGEEDHDCCDCHLIHISMJSSMSSPISSSSMQNSMG=<<;FHLJLHLHRS98887CEENSRSSSSOPJNSSSMISSSSMJNSNDDEEDMSHGFJJKKJHGCEDDCBA@ACJ=====AAALOSPMJSSSSLSNROLMSJHIHIMKLFLKNSAAAABEGDIJSS<<SLNHF=DBA><:66447CEIGSSSSNSMSLSSSOKJSSSSSLIQSQSSS
SKMJKLSPMSSJINLSEDB311212;;==ISSSSJJSSSMHJC@@;;GM<<<<<IILPMJNQFEEEAA:::D@>AAB32222C;;;;SSSMSOSNNLOKNSSSSSMJMHHPSQSSSSQSSSSQOMSSSMJMSJ<:978SSROSSSSSOMISSSSOSLMSLR>@988723/)+*****,<JIMSIHCCA=GFHNSNS@@@@@SSSSOSSNSSNSSOSSSSPSSPSSS
SJJSKPSSSJKMA<=:=ECCF7...<HJMIQHGFGHISSSSSLJKKJN//*&'))'&&&'0369@AEFLKJSHJHMKKKFEEDG@5555BBSFFEDBB99999BCAGSSSSSLKJGKGMNK;::::JIM=7446675559A@?=====SOOLLPSSSSSSRSMLPLKNJSJOSGEECGFFKGM===<=;9961*))*))**,0197>SSSNSSSOSKSSLJSSSSS
SSMOJNSPL<:::655---*))))*78;>>FKKIIHGOOLSSJHSSSRKMNKSSKNSSKILO??>>@@>==?+++++;=AAGGGHNSSKINMSOKSGIGGSQSISSKG<:::<8878000.---/0<<<666612GH,+++-/0''(@CIJJLIC?@;;;;AB@A633458:66.-'       **BC:Z:P6647_22609**        qs:f:17.8067    du
:f:4.1004       ns:i:20502      ts:i:0  mx:i:3  ch:i:2946       st:Z:2024-11-08T18:13:22.454+00:00      rn:i:-1 fn:Z:PAY05180_6a31b7a5_6d04f03f_43.pod5 sm:f:740.842    sd:f:125.658    sv:Z:pa dx:i:0  RG:Z:6d04f03f1d80862f6d7b8
[email protected]_EXP-PBC096_barcode35     pi:Z:06a4e5e2-1a79-496e-9150-a35486b7689a       sp:i:0

Dorado was using the sample sheet to replace the barcode with the alias in the BC tag field, but this confuses fastcat. Rerunning dorado demux without a sample sheet fixes the issue.

@cjw85
Copy link
Contributor

cjw85 commented Jan 20, 2025

I think this will need to be raised internally as an issue in dorado as it should not be mixing up the barcode name and the barcode alias fields; the items should remain distinct.

Could you possibly share a snippet of the file you are inputting into the workflow around the region of this read? Along with an illustration of the directory layout you have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants
@cjw85 @billytcl and others