You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, David's script does not take insertions into account.
There is a file produced called insertions, but it is always empty.
David's Message:
Right so the code for it is in the script that's uploaded just commented out. When talking about this with Ivan he said if the insertion doesn't match in both we should probably throw it out, so you can just adapt the code to that (the logic is already there, the prints say what the if branch means)
For inserts outside the overlap adding them as is should be fine; for those inside the overlap the simplest/safest would be to only add those that completely match (I think that's something in line with what Ivan said), and ignore the rest. What the script currently does is take the read1s version of the insertions in the overlap
Ivan said that misaligned insertions are probably due to incorrect sequencing that propagates the error along the strand, so the most optimal way of doing this would maybe be checking which strand is closest to the reference (so also less deletes) and taking the insertions from there
It makes sense to use Michael's code, see above, as it takes an alignment and outputs and alignment. This way, we could just merge the read pair before the translation.
Currently, David's script does not take insertions into account.
There is a file produced called insertions, but it is always empty.
David's Message:
For inserts outside the overlap adding them as is should be fine; for those inside the overlap the simplest/safest would be to only add those that completely match (I think that's something in line with what Ivan said), and ignore the rest. What the script currently does is take the read1s version of the insertions in the overlap
Ivan said that misaligned insertions are probably due to incorrect sequencing that propagates the error along the strand, so the most optimal way of doing this would maybe be checking which strand is closest to the reference (so also less deletes) and taking the insertions from there
See the original code here:
https://github.com/GenSpectrum/LAPIS-SILO/blob/position-index-deletion-threshold-sam/scripts/read.py
Or we move to Michael's code:
https://gist.github.com/gordonkoehn/f08b1166d9904db0df5fc014dcabf79d
The text was updated successfully, but these errors were encountered: