-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indel support #3
Comments
Hi Geparada,
Thanks for your interest in using DeltaSplice. I believe our current user interface does not deal with indels or oligonucleotide changes. We will try to address this issue in a future version.
Chaolin
… On Nov 13, 2024, at 5:39 PM, geparada ***@***.***> wrote:
Congrats for your paper! DeltaSplice looks very promising.
I was testing pred_deltassu.py using my own toy data and noticed that this script works well for any single nucleotide variant (SNV), but it doesn’t handle variants involving more than one nucleotide. For example, with the following file based on hg38 coordinates:
chrom,mut_position,ref,alt,strand,exon_end,exon_start,acceptor_ssu,donor_ssu
chr10,26771112,G,A,-,26771074,26771089,0.9339399999999999,0.9339399999999999
chr10,26771113,G,A,-,26771074,26771089,0.9339399999999999,0.9339399999999999
everything works well, and I get valid predictions. However, if I try to evaluate the following dinucleotide substitution variant:
chrom,mut_position,ref,alt,strand,exon_end,exon_start,acceptor_ssu,donor_ssu
chr10,26771112,GG,AA,-,26771074,26771089,0.9339399999999999,0.9339399999999999
I get the following assertion error:
Traceback (most recent call last):
File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 240, in <module>
main()
File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 162, in main
assert seq[mut_pos-seq_start]==ref.upper()
AssertionError
This indicates that seq[mut_pos-seq_start] doesn’t match the reference, likely because the script only expects single-nucleotide variants and does not account for the actual length of the ref sequence.
Additionally, when I try to evaluate any indel, such as this insertion:
chrom,mut_position,ref,alt,strand,exon_end,exon_start,acceptor_ssu,donor_ssu
chr10,26771112,G,AA,-,26771074,26771089,0.9339399999999999,0.9339399999999999
I encounter the following error:
Traceback (most recent call last):
File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 240, in <module>
main()
File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 221, in main
pred = [m.predict(d, use_ref=True) for m in Models]
File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 221, in <listcomp>
pred = [m.predict(d, use_ref=True) for m in Models]
File "/fs01/home/geparada/DeltaSplice/deltasplice/models/delta_pretrain.py", line 222, in predict
torch.cat([X, mutX], 0), torch.cat([X, X], 0), exp)
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 35000 but got size 35001 for tensor number 1 in the list.
And if I try to evaluate deletions like this one:
chrom,mut_position,ref,alt,strand,exon_end,exon_start,acceptor_ssu,donor_ssu
chr10,26771112,GG,G,-,26771074,26771089,0.9339399999999999,0.9339399999999999
I get the assertion error again:
Traceback (most recent call last):
File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 240, in <module>
main()
File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 162, in main
assert seq[mut_pos-seq_start]==ref.upper()
AssertionError
I am very interested in testing your tool further with our data, but we need support for evaluating a wide range of variants beyond simple SNVs. Both SpliceAI and Pangolin handle indels, but your model appears even more powerful, and it would be fantastic if it could also support indels and other types of substitutions.
I hope you’ll consider extending your tool to accommodate these types of variants.
Best regards!
—
Reply to this email directly, view it on GitHub <#3>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEJPO7NHHBNKCTAFTJAUD3D2APIL5AVCNFSM6AAAAABRXR5HEGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGY2TOMBSGYYTGMA>.
You are receiving this because you are subscribed to this thread.
|
Hello Chaolin, Thanks a lot for considering trying to address this issue on a future version of DeltaSplice. Guillermo |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Congrats for your paper! DeltaSplice looks very promising.
I was testing
pred_deltassu.py
using my own toy data and noticed that this script works well for any single nucleotide variant (SNV), but it doesn’t handle variants involving more than one nucleotide. For example, with the following file based on hg38 coordinates:everything works well, and I get valid predictions. However, if I try to evaluate the following dinucleotide substitution variant:
I get the following assertion error:
This indicates that
seq[mut_pos-seq_start]
doesn’t match the reference, likely because the script only expects single-nucleotide variants and does not account for the actual length of theref
sequence.Additionally, when I try to evaluate any indel, such as this insertion:
I encounter the following error:
And if I try to evaluate deletions like this one:
I get the assertion error again:
I am very interested in testing your tool further with our data, but we need support for evaluating a wide range of variants beyond simple SNVs. Both SpliceAI and Pangolin handle indels, but your model appears even more powerful, and it would be fantastic if it could also support indels and other types of substitutions.
I hope you’ll consider extending your tool to accommodate these types of variants.
Best regards!
The text was updated successfully, but these errors were encountered: