Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bamtofastq generated 3 reads with triplicate of each reads #134

Open
Davidwei7 opened this issue Apr 9, 2023 · 2 comments
Open

bamtofastq generated 3 reads with triplicate of each reads #134

Davidwei7 opened this issue Apr 9, 2023 · 2 comments

Comments

@Davidwei7
Copy link

Dear Sir/Madam,

Hope you are well.

I download the original bam file and did a bamtofastq convert. So I found that SRR7092170 has a bam. file (link: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR7092170&display=data-access), so i downloaded the bam file and did a bamtofastq convert. My command is this: bamtofastq_linux /lustre/miifs01/project/m2_jgu-canshank3/Individual_Processing/Xiang_try_bam/YX_05.bam /lustre/miifs01/project/m2_jgu-canshank3/Individual_Processing/Xiang_try_bam/output
bamtofastq v1.4.1
Writing finished. Observed 127722890 read pairs. Wrote 127722890 read pair.

After the bamtofastq process, I obtained a full list of fastq files looking like this:

  • bamtofastq_S1_L001_I1_001.fastq.gz
  • bamtofastq_S1_L001_I1_002.fastq.gz
  • bamtofastq_S1_L001_I1_003.fastq.gz
  • bamtofastq_S1_L001_R1_001.fastq.gz
  • bamtofastq_S1_L001_R1_002.fastq.gz
  • bamtofastq_S1_L001_R1_003.fastq.gz
  • bamtofastq_S1_L001_R2_001.fastq.gz
  • bamtofastq_S1_L001_R2_002.fastq.gz
  • bamtofastq_S1_L001_R2_003.fastq.gz
  • bamtofastq_S1_L001_R3_001.fastq.gz
  • bamtofastq_S1_L001_R3_002.fastq.gz
  • bamtofastq_S1_L001_R3_003.fastq.gz

Please see the first few lines of these files:

bamtofastq_S1_L001_I1_001.fastq.gz:

@D00536:344:HFYLCBCXY:1:1114:4988:61338 4:N:0:0
AACGACAC
+
CCDDBIII
@D00536:344:HFYLCBCXY:1:1210:1863:95823 4:N:0:0
CGTCCTCT
+
DDDDAGHH
@D00536:344:HFYLCBCXY:1:2113:4071:88090 4:N:0:0
TTGATGGG

bamtofastq_S1_L001_R1_001.fastq.gz:

@D00536:344:HFYLCBCXY:1:1114:4988:61338 1:N:0:0
AATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAGTCATTCTCAACACCGGCCATGCAGCAAAATCATCAGTGGAAA
+
DDDDDHIEGHIIIIIHHHIHHHIIIIIIIIIIGHFHHIIGIIIIIGGIIIIIDHHHIIHHHHHHHIHIHFEDHHIFH?1FECG@GHGGGHHHHHIHHH
@D00536:344:HFYLCBCXY:1:1210:1863:95823 1:N:0:0
AAAGAAAAATGGTGAATGATACCCGGTGCTGGCAATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAGTCATTCTCA
+
DDCDCHIIHEGCCC1GCEEHHHFHHHIHHII1CCEHGGIIH1EGCHHHHHHHIGHIIHHHEGHIIFIGGHIIIIHIHIIIEHHHHHDHCCFGHHH?C1
@D00536:344:HFYLCBCXY:1:2113:4071:88090 1:N:0:0
AGTTAACGAAAAGAAAAATGGTGAATGATACCCGGTGCTGGCAATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAG

bamtofastq_S1_L001_R2_001.fastq.gz
ATAACATGACCAAC
+
ADA@DIIFI?<FHH
@D00536:344:HFYLCBCXY:1:1210:1863:95823 2:N:0:0
CACGCTACAGATGA
+
DDDDAICCHIIHIE
@D00536:344:HFYLCBCXY:1:2113:4071:88090 2:N:0:0
CACGCTACAGATGA

bamtofastq_S1_L001_R3_001.fastq.gz

@D00536:344:HFYLCBCXY:1:1114:4988:61338 3:N:0:0
GTAGGCAACA
+
DDDDCIIIIH
@D00536:344:HFYLCBCXY:1:1210:1863:95823 3:N:0:0
CGCAAAATAA
+
DDDDDIIIII
@D00536:344:HFYLCBCXY:1:2113:4071:88090 3:N:0:0
CGCAAAATAA

I am confused why there are R3? Did the read 1 got split into two reads? (because the length of R3 and R2 seems to make up to 24 base pair). And why there are R1 - R3 and seems every reads and index files always triplicated into 001, 002 and 003.

I am hoping that I have described my problem sufficiently for a response to solve my issue.

Thank you in advance, and thank you for developing this tool, and looking forward to your response.

Best Wishes,

David

@gandreeva18
Copy link

Screen Shot 2023-08-21 at 1 38 21 PM

I second this question. I see some files are in quadruplicate form?

Also, I do not see the index files?? Is there something I did wrong?

original line of code cellranger bamtofastq *.bam output_folder

TIA

@gandreeva18
Copy link

Dear Sir/Madam,

Hope you are well.

I download the original bam file and did a bamtofastq convert. So I found that SRR7092170 has a bam. file (link: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR7092170&display=data-access), so i downloaded the bam file and did a bamtofastq convert. My command is this: bamtofastq_linux /lustre/miifs01/project/m2_jgu-canshank3/Individual_Processing/Xiang_try_bam/YX_05.bam /lustre/miifs01/project/m2_jgu-canshank3/Individual_Processing/Xiang_try_bam/output bamtofastq v1.4.1 Writing finished. Observed 127722890 read pairs. Wrote 127722890 read pair.

After the bamtofastq process, I obtained a full list of fastq files looking like this:

  • bamtofastq_S1_L001_I1_001.fastq.gz
  • bamtofastq_S1_L001_I1_002.fastq.gz
  • bamtofastq_S1_L001_I1_003.fastq.gz
  • bamtofastq_S1_L001_R1_001.fastq.gz
  • bamtofastq_S1_L001_R1_002.fastq.gz
  • bamtofastq_S1_L001_R1_003.fastq.gz
  • bamtofastq_S1_L001_R2_001.fastq.gz
  • bamtofastq_S1_L001_R2_002.fastq.gz
  • bamtofastq_S1_L001_R2_003.fastq.gz
  • bamtofastq_S1_L001_R3_001.fastq.gz
  • bamtofastq_S1_L001_R3_002.fastq.gz
  • bamtofastq_S1_L001_R3_003.fastq.gz

Please see the first few lines of these files:

bamtofastq_S1_L001_I1_001.fastq.gz:

@D00536:344:HFYLCBCXY:1:1114:4988:61338 4:N:0:0 AACGACAC + CCDDBIII @D00536:344:HFYLCBCXY:1:1210:1863:95823 4:N:0:0 CGTCCTCT + DDDDAGHH @D00536:344:HFYLCBCXY:1:2113:4071:88090 4:N:0:0 TTGATGGG

bamtofastq_S1_L001_R1_001.fastq.gz:

@D00536:344:HFYLCBCXY:1:1114:4988:61338 1:N:0:0 AATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAGTCATTCTCAACACCGGCCATGCAGCAAAATCATCAGTGGAAA + DDDDDHIEGHIIIIIHHHIHHHIIIIIIIIIIGHFHHIIGIIIIIGGIIIIIDHHHIIHHHHHHHIHIHFEDHHIFH?1FECG@GHGGGHHHHHIHHH @D00536:344:HFYLCBCXY:1:1210:1863:95823 1:N:0:0 AAAGAAAAATGGTGAATGATACCCGGTGCTGGCAATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAGTCATTCTCA + DDCDCHIIHEGCCC1GCEEHHHFHHHIHHII1CCEHGGIIH1EGCHHHHHHHIGHIIHHHEGHIIFIGGHIIIIHIHIIIEHHHHHDHCCFGHHH?C1 @D00536:344:HFYLCBCXY:1:2113:4071:88090 1:N:0:0 AGTTAACGAAAAGAAAAATGGTGAATGATACCCGGTGCTGGCAATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAG

bamtofastq_S1_L001_R2_001.fastq.gz ATAACATGACCAAC + ADA@DIIFI?<FHH @D00536:344:HFYLCBCXY:1:1210:1863:95823 2:N:0:0 CACGCTACAGATGA + DDDDAICCHIIHIE @D00536:344:HFYLCBCXY:1:2113:4071:88090 2:N:0:0 CACGCTACAGATGA

bamtofastq_S1_L001_R3_001.fastq.gz

@D00536:344:HFYLCBCXY:1:1114:4988:61338 3:N:0:0 GTAGGCAACA + DDDDCIIIIH @D00536:344:HFYLCBCXY:1:1210:1863:95823 3:N:0:0 CGCAAAATAA + DDDDDIIIII @D00536:344:HFYLCBCXY:1:2113:4071:88090 3:N:0:0 CGCAAAATAA

I am confused why there are R3? Did the read 1 got split into two reads? (because the length of R3 and R2 seems to make up to 24 base pair). And why there are R1 - R3 and seems every reads and index files always triplicated into 001, 002 and 003.

I am hoping that I have described my problem sufficiently for a response to solve my issue.

Thank you in advance, and thank you for developing this tool, and looking forward to your response.

Best Wishes,

David

David,

the reason multiple fastqs are present is because the default reads per fastq option for the bamtofastq conversion is 50000000. If you want one fastq per I1, R1, R2 fastq, you can set the --reads-per-fastq=N arguement to a larger option.

Source: I independently confirmed this by setting --reads-per-fastq=500000000 and only one fastq per read appeared. Also received confirmation from 10x genomics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants