Adding support for kallisto counts #185

skanwal · 2025-01-16T22:58:09Z

Another one for method/input generalisation.
Closes #184.

FYI @JMarzec

pdiakumis

Some small changes. Probably should test this with different options.

R/kallisto.R

R/sample_data.R

pdiakumis

A bit more work needed on the counts logic I think.

pdiakumis · 2025-01-20T00:26:13Z

R/kallisto.R

-    dplyr::mutate(across(
-      .cols = matches('count'),
-      .fns = ~ as.integer(.x)))
+    dplyr::mutate(count = as.integer(.data$count))


You can combine multiple mutates in one instead of duplicating e.g. https://dplyr.tidyverse.org/reference/mutate.html#ref-examples

I tend to use separate calls only when the expression is complex.

pdiakumis · 2025-01-20T00:29:24Z

R/sample_data.R

@@ -69,9 +69,9 @@ read_sample_data <- function(p, results_dir, tx2gene = NULL) {
  kallisto <- p[["kallisto"]]

  # check which quant input is provided
-  if (!isnull(salmon)) {
+  if (is.null(salmon)) {


Why did the ! disappear?

pdiakumis · 2025-01-20T00:29:39Z

R/sample_data.R

    counts <- salmon_counts(salmon, tx2gene = tx2gene)
-  } else if ((!isnull(kallisto))) {
+  } else if ((is.null(kallisto))) {


Why did the ! disappear?

I think the logic needs a bit more work here. What are the different options you're expecting? One can provide:

a kallisto file

a salmon file

a salmon and a kallisto file

neither

For the last two options - would it suffice if "Null" is returned for these cases - that is:

else if (!is.null(kallisto) && !is.null(salmon)) { return(NULL) } else { return(NULL) }

Or we should generate specific error messages in the main rmd?

If you put a return statement there it will prematurely terminate the read_sample_data function, which we don't want I believe.

Do you think the dragen_wts_dir should be removed and instead rnasum should just work directly with file paths? That would simplify this a bit in terms of logic. You'd just be dealing with a list p which may or may not have elements salmon and kallisto.

Do you think the dragen_wts_dir should be removed and instead rnasum should just work directly with file paths?

That'd then also require refactoring at infrastructure level as we use that as an input to CWL workflow?
Not negating - but perhaps something we should add as a to-do for the next release.

For the last two options - how about assigning null to counts and then an assert that statement with an error message in rmd before RNAsum::combineDatasets call? So it doesn't proceed with further processing?

If that makes sense please go ahead, sorry if I'm being a bit picky.
If I were to do this properly, I'd create a new function (e.g. check_params) that takes as input the params list, checks each element's validity, and outputs a new list with e.g. a counts element that gets used in the Rmd. But happy for you to implement the kallisto support however you wish, just try to make sure the basic cases are supported and that the logic works. Also I'd try stop the Rmd execution as early as possible upon invalid param specification.

Spent sometime testing with the following variable conditions to make sure rmd execution is stopped earlier if the quant inputs are incorrect (ref: last two commits).

kallisto file

salmon file

salmon and kallisto file

neither

pdiakumis · 2025-01-28T04:20:58Z

R/sample_data.R

+  # check which quant input is provided
+  if (!is.null(salmon)) {
+    counts <- salmon_counts(salmon, tx2gene = tx2gene)
+  } else if (!is.null(kallisto)) {
+    counts <- kallisto_counts(kallisto, tx2gene = tx2gene)
+  } else if (!is.null(kallisto) && !is.null(salmon)) {
+    counts <- NULL
+  } else {
+    counts <- NULL
+  }


Looking at the logic here, let's say we have the booleans A and B. This is what's happening here:

if (A) { # if salmon is specified ... } else if (B) { # so salmon is not specified but kallisto is ... } else if (A && B) { # so salmon is not specified, and kallisto is not specified, but you're checking if both of them are specified ... } else { ... }

pdiakumis · 2025-01-28T04:25:33Z

inst/rmd/rnasum.Rmd

@@ -122,6 +123,11 @@ ref_dataset.list <- vector("list", length(dataset)) |> set_names(dataset)
 ref_genes.list <- RNAsum::get_refgenes(params)
 # sample WTS/WGS data
 sample_data.list <- RNAsum::read_sample_data(params, results_dir, tx2gene = tx2ensembl)
+# check counts input was correctly read and not null


So the R session will first call RNAsum::read_sample_data which is an intensive and time-consuming process. If no salmon or kallisto inputs have been provided, it goes to waste. What if you catch that error at the beginning of RNAsum::read_sample_data?

skanwal added 2 commits January 16, 2025 11:02

adding support for kallisto counts

449f5af

update cli

b374a1b

skanwal requested a review from pdiakumis January 16, 2025 22:58

pdiakumis requested changes Jan 17, 2025

View reviewed changes

R/kallisto.R Show resolved Hide resolved

R/kallisto.R Outdated Show resolved Hide resolved

R/sample_data.R Outdated Show resolved Hide resolved

R/sample_data.R Outdated Show resolved Hide resolved

skanwal added 2 commits January 17, 2025 15:07

update filter and mutate conditions

f371ed6

update null condition

6c6243d

pdiakumis requested changes Jan 20, 2025

View reviewed changes

skanwal added 3 commits January 21, 2025 14:32

combine mutate call

1df231b

check which quant file is provided as an input

1eb579d

check for valid input param

a5695cd

pdiakumis requested changes Jan 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for kallisto counts #185

Adding support for kallisto counts #185

skanwal commented Jan 16, 2025

pdiakumis left a comment

pdiakumis left a comment

pdiakumis Jan 20, 2025

pdiakumis Jan 20, 2025

pdiakumis Jan 20, 2025

pdiakumis Jan 20, 2025

skanwal Jan 21, 2025

pdiakumis Jan 21, 2025

pdiakumis Jan 21, 2025

skanwal Jan 21, 2025

skanwal Jan 21, 2025

pdiakumis Jan 21, 2025

skanwal Jan 23, 2025

pdiakumis Jan 28, 2025

pdiakumis Jan 28, 2025

Adding support for kallisto counts #185

Are you sure you want to change the base?

Adding support for kallisto counts #185

Conversation

skanwal commented Jan 16, 2025

pdiakumis left a comment

Choose a reason for hiding this comment

pdiakumis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment