Parallelize when possible? #10

federicomarini · 2021-01-26T13:37:00Z

As the samples get processed one by one, it might be of interest to try and parallelize that so that runtimes might be significantly shortened, especially when running many samples at once

BiocParallel might be providing a very nice & convenient way to do so

The text was updated successfully, but these errors were encountered:

federicomarini · 2021-01-29T14:54:58Z

Related to this:
I did some profiling on the main function to run quantiseq, and basically noticed that the bottleneck is actually prior to that, namely in the mapGenes function.
So, after some in-depth debugging I came to think that the solution in here e0a8731 should be robust enough.
Maybe worth porting to the current state of immunedeconv, so I am pinging @grst on this 😉

Then: an additional thing to be done would be to do the aggregation only on the lines that have the duplicate row names, so that would speed it up "massively enough" to the extent we won't really need to parallelize.
Happy to wrap up a tiny PR if you're all good on this!

grst · 2021-01-29T14:59:33Z

A dplyr groupby(gene_symbol) %>% summarise_all(sum)) should be considerably faster than base R.

Happy to include the parallelized version into immunedeconv, but probably it's easiest to wait until this package is more or less ready and then port immunedeconv to use it as a dependency.

federicomarini · 2021-01-29T15:02:25Z

As of now no parallelization is done, just a conditional check - from my understanding, this aggregation needs to be done only if any rownames are duplicated.

But as you said: probably best to give it the time to sediment in here and then just use it as Imports

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize when possible? #10

Parallelize when possible? #10

federicomarini commented Jan 26, 2021

federicomarini commented Jan 29, 2021

grst commented Jan 29, 2021 •

edited

Loading

federicomarini commented Jan 29, 2021

Parallelize when possible? #10

Parallelize when possible? #10

Comments

federicomarini commented Jan 26, 2021

federicomarini commented Jan 29, 2021

grst commented Jan 29, 2021 • edited Loading

federicomarini commented Jan 29, 2021

grst commented Jan 29, 2021 •

edited

Loading