Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indel support #3

Open
geparada opened this issue Nov 13, 2024 · 2 comments
Open

Indel support #3

geparada opened this issue Nov 13, 2024 · 2 comments

Comments

@geparada
Copy link

Congrats for your paper! DeltaSplice looks very promising.

I was testing pred_deltassu.py using my own toy data and noticed that this script works well for any single nucleotide variant (SNV), but it doesn’t handle variants involving more than one nucleotide. For example, with the following file based on hg38 coordinates:

chrom,mut_position,ref,alt,strand,exon_end,exon_start,acceptor_ssu,donor_ssu
chr10,26771112,G,A,-,26771074,26771089,0.9339399999999999,0.9339399999999999
chr10,26771113,G,A,-,26771074,26771089,0.9339399999999999,0.9339399999999999

everything works well, and I get valid predictions. However, if I try to evaluate the following dinucleotide substitution variant:

chrom,mut_position,ref,alt,strand,exon_end,exon_start,acceptor_ssu,donor_ssu
chr10,26771112,GG,AA,-,26771074,26771089,0.9339399999999999,0.9339399999999999

I get the following assertion error:

Traceback (most recent call last):
  File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 240, in <module>
    main()
  File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 162, in main
    assert seq[mut_pos-seq_start]==ref.upper()
AssertionError

This indicates that seq[mut_pos-seq_start] doesn’t match the reference, likely because the script only expects single-nucleotide variants and does not account for the actual length of the ref sequence.

Additionally, when I try to evaluate any indel, such as this insertion:

chrom,mut_position,ref,alt,strand,exon_end,exon_start,acceptor_ssu,donor_ssu
chr10,26771112,G,AA,-,26771074,26771089,0.9339399999999999,0.9339399999999999

I encounter the following error:

Traceback (most recent call last):
  File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 240, in <module>
    main()
  File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 221, in main
    pred = [m.predict(d, use_ref=True) for m in Models]
  File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 221, in <listcomp>
    pred = [m.predict(d, use_ref=True) for m in Models]
  File "/fs01/home/geparada/DeltaSplice/deltasplice/models/delta_pretrain.py", line 222, in predict
    torch.cat([X, mutX], 0), torch.cat([X, X], 0), exp)
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 35000 but got size 35001 for tensor number 1 in the list.

And if I try to evaluate deletions like this one:

chrom,mut_position,ref,alt,strand,exon_end,exon_start,acceptor_ssu,donor_ssu
chr10,26771112,GG,G,-,26771074,26771089,0.9339399999999999,0.9339399999999999

I get the assertion error again:

Traceback (most recent call last):
  File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 240, in <module>
    main()
  File "/fs01/home/geparada/DeltaSplice/pred_deltassu.py", line 162, in main
    assert seq[mut_pos-seq_start]==ref.upper()
AssertionError

I am very interested in testing your tool further with our data, but we need support for evaluating a wide range of variants beyond simple SNVs. Both SpliceAI and Pangolin handle indels, but your model appears even more powerful, and it would be fantastic if it could also support indels and other types of substitutions.

I hope you’ll consider extending your tool to accommodate these types of variants.

Best regards!

@zhangchaolin
Copy link
Contributor

zhangchaolin commented Nov 18, 2024 via email

@geparada
Copy link
Author

Hello Chaolin,

Thanks a lot for considering trying to address this issue on a future version of DeltaSplice.
Your model offer some key innovations (such as the Dual-sequence mode) that are very interesting and I am looking forward to try over whole genome variant profiles.

Guillermo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants