Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Pairing & Merge #54

Open
gordonkoehn opened this issue Dec 2, 2024 · 3 comments
Open

Read Pairing & Merge #54

gordonkoehn opened this issue Dec 2, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@gordonkoehn
Copy link
Collaborator

Currently, David's script does not take insertions into account.

There is a file produced called insertions, but it is always empty.

David's Message:

Right so the code for it is in the script that's uploaded just commented out. When talking about this with Ivan he said if the insertion doesn't match in both we should probably throw it out, so you can just adapt the code to that (the logic is already there, the prints say what the if branch means)

For inserts outside the overlap adding them as is should be fine; for those inside the overlap the simplest/safest would be to only add those that completely match (I think that's something in line with what Ivan said), and ignore the rest. What the script currently does is take the read1s version of the insertions in the overlap

Ivan said that misaligned insertions are probably due to incorrect sequencing that propagates the error along the strand, so the most optimal way of doing this would maybe be checking which strand is closest to the reference (so also less deletes) and taking the insertions from there

See the original code here:

https://github.com/GenSpectrum/LAPIS-SILO/blob/position-index-deletion-threshold-sam/scripts/read.py

Or we move to Michael's code:

https://gist.github.com/gordonkoehn/f08b1166d9904db0df5fc014dcabf79d

@gordonkoehn gordonkoehn added the enhancement New feature or request label Dec 2, 2024
@gordonkoehn
Copy link
Collaborator Author

gordonkoehn commented Dec 2, 2024

Perhaps after all we'll use this code here:

@gordonkoehn
Copy link
Collaborator Author

To be addressed once we know how we do:

@gordonkoehn gordonkoehn changed the title fix: insertions in _process Read Pairing & Merge Jan 8, 2025
@gordonkoehn
Copy link
Collaborator Author

It makes sense to use Michael's code, see above, as it takes an alignment and outputs and alignment. This way, we could just merge the read pair before the translation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant