Host map: ensure paired results #286

necrolyte2 · 2016-03-03T18:33:55Z

If R1 is filtered in host map remove R2 as well
If R2 is filtered in host map remove R1 as well

Reasoning:
If one of the pair matches the host then both should have matched so both should be removed
Right now, there ends up to be a bunch of R2 reads that should have been filtered out, but end up going through costly blast stages

averagehat · 2016-04-04T19:25:20Z

It wasn't obvious to me how to do this (without destroying RAM) until I realized the discarded reads are available to look at.

Sort the discarded reads (using unix sort)
Open host_map_R1.fastq and R1.discard
Iterate through host_map_R1 as a generator;
- If the current ID in host-map-R1 equals the top ID in R1.discard:
1. filter that out of host-mapR1
2. Advance to the next ID in R1.discard
  (essentially treating R1.discard like a stack)
do the same for R2

This way only load one record/id into memory at a time.

NB: It would probably be better to chunk out the writes rather than writing them one at a time--anyway this will just pass SeqIO.write a generator and I'm not sure if they do chunking or not

necrolyte2 · 2016-04-04T19:29:37Z

is host_map_R1 sorted already?

averagehat · 2016-04-04T19:32:46Z

No, I thought it would be, but it doesn't appear to be. edit-- this si wrong

On Mon, Apr 4, 2016 at 3:29 PM, Tyghe Vallard [email protected]
wrote:

is host_map_R1 sorted already?

—
You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub
#286 (comment)

necrolyte2 · 2016-04-04T19:34:27Z

So you will have to sort both files then right?

averagehat · 2016-04-04T19:36:25Z

oops, I misread your comment.
host_map_R1 is sorted. R1.discard is not.

I'm not sure where the sorting for host_map_r1 comes from, or rather why that's maintained when the other isn't. But it's a precondition of this algorithm to work.

necrolyte2 · 2016-04-04T19:40:04Z

K. Might want to verify that the sort is enforced somehow prior

averagehat · 2016-04-04T19:47:40Z

In the least it can be an assertion somewhere.
If you finish and R1.discard is non-empty, you know something went wrong.
or

current = next(seq)
assert current.id > last.id

necrolyte2 · 2016-04-04T20:13:49Z

are they always going to be integers?

averagehat · 2016-04-04T20:15:38Z

yeah this is after the pipeline has set the ids to integers

On Mon, Apr 4, 2016 at 4:13 PM, Tyghe Vallard [email protected]
wrote:

are they always going to be integers?

—
You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub
#286 (comment)

averagehat · 2016-04-04T21:03:03Z

Woops I phrased this wrong. You run with host_map_R1 and R2.discard, and host_map_R2 and R1.discard.
R1.discard will already have been dropped from host_map_R1, you need to drop what was dropped from the pair . . .

necrolyte2 added bug enhancement labels Mar 3, 2016

necrolyte2 modified the milestone: host map paired and summary Mar 3, 2016

averagehat added the in progress label Mar 30, 2016

averagehat self-assigned this Mar 30, 2016

averagehat mentioned this issue Apr 4, 2016

add drop_mapped step #290

Closed

necrolyte2 removed this from the host map paired and summary milestone Apr 5, 2016

averagehat mentioned this issue Apr 6, 2016

host_map counts unmapped only if both mates unmapped #291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Host map: ensure paired results #286

Host map: ensure paired results #286

necrolyte2 commented Mar 3, 2016

averagehat commented Apr 4, 2016

necrolyte2 commented Apr 4, 2016

averagehat commented Apr 4, 2016

necrolyte2 commented Apr 4, 2016

averagehat commented Apr 4, 2016

necrolyte2 commented Apr 4, 2016

averagehat commented Apr 4, 2016

necrolyte2 commented Apr 4, 2016

averagehat commented Apr 4, 2016

averagehat commented Apr 4, 2016

Host map: ensure paired results #286

Host map: ensure paired results #286

Comments

necrolyte2 commented Mar 3, 2016

averagehat commented Apr 4, 2016

necrolyte2 commented Apr 4, 2016

averagehat commented Apr 4, 2016

necrolyte2 commented Apr 4, 2016

averagehat commented Apr 4, 2016

necrolyte2 commented Apr 4, 2016

averagehat commented Apr 4, 2016

necrolyte2 commented Apr 4, 2016

averagehat commented Apr 4, 2016

averagehat commented Apr 4, 2016