Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“genome chunk count: 0” in human genome #19

Open
YKasama opened this issue Nov 16, 2023 · 0 comments
Open

“genome chunk count: 0” in human genome #19

YKasama opened this issue Nov 16, 2023 · 0 comments

Comments

@YKasama
Copy link

YKasama commented Nov 16, 2023

Hello,

I'm trying to use analysis using tebreak, but it doesn't work.
The sample data has been processed successfully.

The command executed is as follows.

tebreak -b test.bam
-r /data2/pub_data/hg19/chromosomes/hg19.fa
-p 30
-d /home/gene/work/hg19.te.Alu_disctgt.txt
-m /home/gene/TE/TEBreak/tebreak/lib/hg19.chr.centromere_telomere.bed
--max_ins_reads 500
-i /home/gene/work/test.Alu_RepBase2.fa

The execution log is as follows.

loading bwa index /data2/pub_data/mm10/mm10_all.fa into shared memory ...
loaded.
discordant targets in: /home/gene/work/hg19.te.Alu_disctgt.txt
genome chunk count: 0 <-- 

About the specified file:
BED file of masked regions : hg19.chr.centromere_telomere.bed
We used files with "chr" at the beginning of the chromosome name.
hg19.centromere_telomere.bed file had a mix of lines with "chr"
in some entries.

discordant mate-linked targets:hg19.te.Alu_disctgt.txt
I want to target "Alu" comprehensively, so I created it
with the following command.

$ cat /tmp/rmsk//.out |
grep 'Alu' | awk '{print $5"\t"$6"\t"$7"\t"$11"\t"$10"\t"$9}' |
sed -e 's/C$/-/' > hg19.te.Alu_disctgt.txt

The contents of this file are as follows.
chr10 61181 61345 SINE/Alu FRAM -
chr10 67261 67438 SINE/Alu AluSp +
chr10 71658 71935 SINE/Alu AluJr +
chr10 72171 72462 SINE/Alu AluSg4 -
chr10 77316 77627 SINE/Alu AluSp +
chr10 97072 97325 SINE/Alu AluSx1 -
chr10 99058 99367 SINE/Alu AluSz -
   :

I have a question.
(1)The centromere_telomere.bed and disctgt.txt files are
If both files are human, does it matter whether chr is added or not?
The result is the same whether or not the entry has chr.

(2)Why is "genome chunk count: 0" and processing is not possible?
Please advise where the problem is.

Regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant