-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host map: ensure paired results #286
Comments
It wasn't obvious to me how to do this (without destroying RAM) until I realized the discarded reads are available to look at.
This way only load one record/id into memory at a time. NB: It would probably be better to chunk out the writes rather than writing them one at a time--anyway this will just pass SeqIO.write a generator and I'm not sure if they do chunking or not |
is host_map_R1 sorted already? |
No, I thought it would be, but it doesn't appear to be. edit-- this si wrong On Mon, Apr 4, 2016 at 3:29 PM, Tyghe Vallard [email protected]
|
So you will have to sort both files then right? |
oops, I misread your comment. I'm not sure where the sorting for host_map_r1 comes from, or rather why that's maintained when the other isn't. But it's a precondition of this algorithm to work. |
K. Might want to verify that the sort is enforced somehow prior |
In the least it can be an assertion somewhere. current = next(seq)
assert current.id > last.id |
are they always going to be integers? |
yeah this is after the pipeline has set the ids to integers On Mon, Apr 4, 2016 at 4:13 PM, Tyghe Vallard [email protected]
|
Woops I phrased this wrong. You run with host_map_R1 and R2.discard, and host_map_R2 and R1.discard. |
If R1 is filtered in host map remove R2 as well
If R2 is filtered in host map remove R1 as well
Reasoning:
If one of the pair matches the host then both should have matched so both should be removed
Right now, there ends up to be a bunch of R2 reads that should have been filtered out, but end up going through costly blast stages
The text was updated successfully, but these errors were encountered: