bamtofastq generated 3 reads with triplicate of each reads #134

Davidwei7 · 2023-04-09T18:14:37Z

Dear Sir/Madam,

Hope you are well.

I download the original bam file and did a bamtofastq convert. So I found that SRR7092170 has a bam. file (link: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR7092170&display=data-access), so i downloaded the bam file and did a bamtofastq convert. My command is this: bamtofastq_linux /lustre/miifs01/project/m2_jgu-canshank3/Individual_Processing/Xiang_try_bam/YX_05.bam /lustre/miifs01/project/m2_jgu-canshank3/Individual_Processing/Xiang_try_bam/output
bamtofastq v1.4.1
Writing finished. Observed 127722890 read pairs. Wrote 127722890 read pair.

After the bamtofastq process, I obtained a full list of fastq files looking like this:

bamtofastq_S1_L001_I1_001.fastq.gz
bamtofastq_S1_L001_I1_002.fastq.gz
bamtofastq_S1_L001_I1_003.fastq.gz
bamtofastq_S1_L001_R1_001.fastq.gz
bamtofastq_S1_L001_R1_002.fastq.gz
bamtofastq_S1_L001_R1_003.fastq.gz
bamtofastq_S1_L001_R2_001.fastq.gz
bamtofastq_S1_L001_R2_002.fastq.gz
bamtofastq_S1_L001_R2_003.fastq.gz
bamtofastq_S1_L001_R3_001.fastq.gz
bamtofastq_S1_L001_R3_002.fastq.gz
bamtofastq_S1_L001_R3_003.fastq.gz

Please see the first few lines of these files:

bamtofastq_S1_L001_I1_001.fastq.gz:

@D00536:344:HFYLCBCXY:1:1114:4988:61338 4:N:0:0
AACGACAC
+
CCDDBIII
@D00536:344:HFYLCBCXY:1:1210:1863:95823 4:N:0:0
CGTCCTCT
+
DDDDAGHH
@D00536:344:HFYLCBCXY:1:2113:4071:88090 4:N:0:0
TTGATGGG

bamtofastq_S1_L001_R1_001.fastq.gz:

@D00536:344:HFYLCBCXY:1:1114:4988:61338 1:N:0:0
AATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAGTCATTCTCAACACCGGCCATGCAGCAAAATCATCAGTGGAAA
+
DDDDDHIEGHIIIIIHHHIHHHIIIIIIIIIIGHFHHIIGIIIIIGGIIIIIDHHHIIHHHHHHHIHIHFEDHHIFH?1FECG@GHGGGHHHHHIHHH
@D00536:344:HFYLCBCXY:1:1210:1863:95823 1:N:0:0
AAAGAAAAATGGTGAATGATACCCGGTGCTGGCAATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAGTCATTCTCA
+
DDCDCHIIHEGCCC1GCEEHHHFHHHIHHII1CCEHGGIIH1EGCHHHHHHHIGHIIHHHEGHIIFIGGHIIIIHIHIIIEHHHHHDHCCFGHHH?C1
@D00536:344:HFYLCBCXY:1:2113:4071:88090 1:N:0:0
AGTTAACGAAAAGAAAAATGGTGAATGATACCCGGTGCTGGCAATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAG

bamtofastq_S1_L001_R2_001.fastq.gz
ATAACATGACCAAC
+
ADA@DIIFI?<FHH
@D00536:344:HFYLCBCXY:1:1210:1863:95823 2:N:0:0
CACGCTACAGATGA
+
DDDDAICCHIIHIE
@D00536:344:HFYLCBCXY:1:2113:4071:88090 2:N:0:0
CACGCTACAGATGA

bamtofastq_S1_L001_R3_001.fastq.gz

@D00536:344:HFYLCBCXY:1:1114:4988:61338 3:N:0:0
GTAGGCAACA
+
DDDDCIIIIH
@D00536:344:HFYLCBCXY:1:1210:1863:95823 3:N:0:0
CGCAAAATAA
+
DDDDDIIIII
@D00536:344:HFYLCBCXY:1:2113:4071:88090 3:N:0:0
CGCAAAATAA

I am confused why there are R3? Did the read 1 got split into two reads? (because the length of R3 and R2 seems to make up to 24 base pair). And why there are R1 - R3 and seems every reads and index files always triplicated into 001, 002 and 003.

I am hoping that I have described my problem sufficiently for a response to solve my issue.

Thank you in advance, and thank you for developing this tool, and looking forward to your response.

Best Wishes,

David

The text was updated successfully, but these errors were encountered:

gandreeva18 · 2023-08-21T17:44:56Z

I second this question. I see some files are in quadruplicate form?

Also, I do not see the index files?? Is there something I did wrong?

original line of code cellranger bamtofastq *.bam output_folder

TIA

gandreeva18 · 2023-08-22T13:37:33Z

Dear Sir/Madam,

Hope you are well.

I download the original bam file and did a bamtofastq convert. So I found that SRR7092170 has a bam. file (link: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR7092170&display=data-access), so i downloaded the bam file and did a bamtofastq convert. My command is this: bamtofastq_linux /lustre/miifs01/project/m2_jgu-canshank3/Individual_Processing/Xiang_try_bam/YX_05.bam /lustre/miifs01/project/m2_jgu-canshank3/Individual_Processing/Xiang_try_bam/output bamtofastq v1.4.1 Writing finished. Observed 127722890 read pairs. Wrote 127722890 read pair.

After the bamtofastq process, I obtained a full list of fastq files looking like this:

bamtofastq_S1_L001_I1_001.fastq.gz

bamtofastq_S1_L001_I1_002.fastq.gz

bamtofastq_S1_L001_I1_003.fastq.gz

bamtofastq_S1_L001_R1_001.fastq.gz

bamtofastq_S1_L001_R1_002.fastq.gz

bamtofastq_S1_L001_R1_003.fastq.gz

bamtofastq_S1_L001_R2_001.fastq.gz

bamtofastq_S1_L001_R2_002.fastq.gz

bamtofastq_S1_L001_R2_003.fastq.gz

bamtofastq_S1_L001_R3_001.fastq.gz

bamtofastq_S1_L001_R3_002.fastq.gz

bamtofastq_S1_L001_R3_003.fastq.gz

Please see the first few lines of these files:

bamtofastq_S1_L001_I1_001.fastq.gz:

@D00536:344:HFYLCBCXY:1:1114:4988:61338 4:N:0:0 AACGACAC + CCDDBIII @D00536:344:HFYLCBCXY:1:1210:1863:95823 4:N:0:0 CGTCCTCT + DDDDAGHH @D00536:344:HFYLCBCXY:1:2113:4071:88090 4:N:0:0 TTGATGGG

bamtofastq_S1_L001_R1_001.fastq.gz:

@D00536:344:HFYLCBCXY:1:1114:4988:61338 1:N:0:0 AATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAGTCATTCTCAACACCGGCCATGCAGCAAAATCATCAGTGGAAA + DDDDDHIEGHIIIIIHHHIHHHIIIIIIIIIIGHFHHIIGIIIIIGGIIIIIDHHHIIHHHHHHHIHIHFEDHHIFH?1FECG@GHGGGHHHHHIHHH @D00536:344:HFYLCBCXY:1:1210:1863:95823 1:N:0:0 AAAGAAAAATGGTGAATGATACCCGGTGCTGGCAATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAGTCATTCTCA + DDCDCHIIHEGCCC1GCEEHHHFHHHIHHII1CCEHGGIIH1EGCHHHHHHHIGHIIHHHEGHIIFIGGHIIIIHIHIIIEHHHHHDHCCFGHHH?C1 @D00536:344:HFYLCBCXY:1:2113:4071:88090 1:N:0:0 AGTTAACGAAAAGAAAAATGGTGAATGATACCCGGTGCTGGCAATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAG

bamtofastq_S1_L001_R2_001.fastq.gz ATAACATGACCAAC + ADA@DIIFI?<FHH @D00536:344:HFYLCBCXY:1:1210:1863:95823 2:N:0:0 CACGCTACAGATGA + DDDDAICCHIIHIE @D00536:344:HFYLCBCXY:1:2113:4071:88090 2:N:0:0 CACGCTACAGATGA

bamtofastq_S1_L001_R3_001.fastq.gz

@D00536:344:HFYLCBCXY:1:1114:4988:61338 3:N:0:0 GTAGGCAACA + DDDDCIIIIH @D00536:344:HFYLCBCXY:1:1210:1863:95823 3:N:0:0 CGCAAAATAA + DDDDDIIIII @D00536:344:HFYLCBCXY:1:2113:4071:88090 3:N:0:0 CGCAAAATAA

I am confused why there are R3? Did the read 1 got split into two reads? (because the length of R3 and R2 seems to make up to 24 base pair). And why there are R1 - R3 and seems every reads and index files always triplicated into 001, 002 and 003.

I am hoping that I have described my problem sufficiently for a response to solve my issue.

Thank you in advance, and thank you for developing this tool, and looking forward to your response.

Best Wishes,

David

David,

the reason multiple fastqs are present is because the default reads per fastq option for the bamtofastq conversion is 50000000. If you want one fastq per I1, R1, R2 fastq, you can set the --reads-per-fastq=N arguement to a larger option.

Source: I independently confirmed this by setting --reads-per-fastq=500000000 and only one fastq per read appeared. Also received confirmation from 10x genomics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bamtofastq generated 3 reads with triplicate of each reads #134

bamtofastq generated 3 reads with triplicate of each reads #134

Davidwei7 commented Apr 9, 2023

gandreeva18 commented Aug 21, 2023

gandreeva18 commented Aug 22, 2023

bamtofastq generated 3 reads with triplicate of each reads #134

bamtofastq generated 3 reads with triplicate of each reads #134

Comments

Davidwei7 commented Apr 9, 2023

gandreeva18 commented Aug 21, 2023

gandreeva18 commented Aug 22, 2023