Add calculate_cell_cluster_metrics() function #23

cansavvy · 2024-12-18T14:58:19Z

Background

This PR is for #10. Tried to follow the context there. But since this is my first PR on this project I might be missing some insights so Im just posting this as a draft first so that someone can check that I'm in the generally right direction.

Summary

This is a function that takes output from sweep_clusters() and runs evals on it. It can run calculate_silhouette() and/or calculate_purity() on all elements of the list of data frames that are outputed from sweep_clusters().

An additional very nit picky thing I have in here but I was very minorly thrown off by the examples in sweep_clusters() being named cluster_df when they are actually lists of data frames and not data frames out right. If you don't like this change no worries. The documentation itself is very clear but I'm just a person who kinda goes straight for the examples first.

Requested feedback

Am I understanding this right? It doesn't seem like this function will be that useful but I trust ya'll have more context and knowledge about the needs of the project than I who just started looking at this stuff last week lol.

Side side question that is also very minor, can we name the data frames in the list according to the combo of their parameters or do we find that would be too clunky? I like names in my lists but this is maybe a cansavvy quirk we don't need to subject everyone else to.

for more information, see https://pre-commit.ci

cansavvy · 2024-12-18T14:59:38Z

R/evaluate-clusters.R

+#'   The cell id column's values should match either the PC matrix row names, or the
+#'   SingleCellExperiment/Seurat object cell ids. Typically this data frame will be
+#'   output from the `rOpenScPCA::calculate_clusters()` function.
+#' @param ... Additional argument are passed on to the respective `calculate_purity()` and


I recognize that I didn't look to closely into these arguments and if someone wanted to specify a different argument for purity versus silhouette they would not be able to do so this way because all arguments get pass to both functions. If we think this will be a common use case I can go back and adjust.

Since these arguments are passed to different functions with different expectations, these are likely to conflict. I think it is probably better for this convenience function not to use .... But I don't know that we need to pass further options; if something more complex is needed, the user can run purrr::map on their own.

cansavvy · 2024-12-18T15:00:10Z

R/evaluate-clusters.R

+#'
+#' set.seed(2024)
+#'
+#' sce_object <- splatter::simpleSimulate(nGenes = 1000, verbose = FALSE) |>


I added these steps just because I wanted to illustrating how I was testing this but if this is too much detail for this example we can trim this down.

I think it's probably too much detail just in the sense that a novice might look at this and say, "oh no, do i need splatter?"

I would simplify to by assuming an sce_object variable is already known/exists. Consistent with other evaluation function examples, you don't need to pull out the PCA either; just pass in the object directly. Let's have the example therefore just "run" (i.e., keep the \dontrun{} construct!) sweep_clusters() and calculate_cell_cluster_metrics()

Is there an example sce_object that already exists I can pull from? If so how do I call it?

Since these examples are not run, it is fine to just "assume" an sce_object exists for this section, and you can start from the sweep_clusters() step (skipping the prep).

R/evaluate-clusters.R

cansavvy · 2024-12-18T15:18:33Z

Still left to do here:

Polishing docs (I copied and pasted and edited but IDK) -- did this recently
Writing tests - This is in Tests for calculate_cell_cluster_metrics() function #29
Incorporating any feedback - Did this in my most recent commits

jashapiro

Thanks for this contribution! My main comment here (aside from my earlier misread) is that I still wonder if we want this function to work on a list, rather than just on a single clustering data frame. The convenience of calculating all the metrics at once makes sense to me, but if we have a function that evaluates a single data frame, then turning that into evaluating the list is a very simple addition of a purrr wrapper, and I think that seems more transparent. But others may disagree!

R/evaluate-clusters.R

jashapiro · 2024-12-18T15:16:01Z

R/evaluate-clusters.R

+#'   The cell id column's values should match either the PC matrix row names, or the
+#'   SingleCellExperiment/Seurat object cell ids. Typically this data frame will be
+#'   output from the `rOpenScPCA::calculate_clusters()` function.
+#' @param ... Additional argument are passed on to the respective `calculate_purity()` and


Since these arguments are passed to different functions with different expectations, these are likely to conflict. I think it is probably better for this convenience function not to use .... But I don't know that we need to pass further options; if something more complex is needed, the user can run purrr::map on their own.

sjspielman

Thanks for starting this!! I left some initial comments here, but before I look more, I have a thought about the use cases for this function - currently it's written to only run on a sweep list, but it makes sense to me to also make this flexible enough to run on single data frame (e.g., not a list of data frames). This means updates related to the input argument sweep_list:

First, give it a more flexible name... Maybe something like cluster_results? I don't love it, but I'm not sure of how else to communicate that it might be either a list of dfs or df, so really I don't hate it either!
Second, add a check if it's a data frame and if so, make it a list of length 1 with the given data frame in it. So the code might look like:

if (data frame) { listify it}
else { run the existing stopifnot checks}

I suppose we'd need another check to determine whether to return the list or the first index from the eval'd df, too, so you might actually structure this code by first defining an is_df variable or so, and using that for both checks (the opening sanity check to transform it into a list to play nicely with purrr, and the final check for what to return.

R/evaluate-clusters.R

sjspielman · 2024-12-18T15:56:09Z

R/evaluate-clusters.R

+#'
+#' set.seed(2024)
+#'
+#' sce_object <- splatter::simpleSimulate(nGenes = 1000, verbose = FALSE) |>


I think it's probably too much detail just in the sense that a novice might look at this and say, "oh no, do i need splatter?"

I would simplify to by assuming an sce_object variable is already known/exists. Consistent with other evaluation function examples, you don't need to pull out the PCA either; just pass in the object directly. Let's have the example therefore just "run" (i.e., keep the \dontrun{} construct!) sweep_clusters() and calculate_cell_cluster_metrics()

Co-authored-by: Joshua Shapiro <[email protected]>

Co-authored-by: Stephanie Spielman <[email protected]>

for more information, see https://pre-commit.ci

…savvy/multi_sweep

for more information, see https://pre-commit.ci

…savvy/multi_sweep

for more information, see https://pre-commit.ci

Co-authored-by: Stephanie Spielman <[email protected]>

for more information, see https://pre-commit.ci

sjspielman

Noting I haven't run this yet, but will run after this round of review! Let me know if I can clarify anything here :)

sjspielman · 2025-01-27T16:31:02Z

R/sweep-clusters.R

I very much take your point here about the cluster_df name confusion, definitely fine with this change

R/evaluate-clusters.R

sjspielman · 2025-01-27T16:39:29Z

R/evaluate-clusters.R

+#'   resolution = 0.1,
+#'   seed = 11


Bringing in @jashapiro if you have an opinion here -

In hello-clusters usage examples, we only set the seed once at the beginning of the code (as you did above with set.seed(2024)), but not passing any seed into individual functions. When I was reviewing and saw this, I wondered maybe we should remove the seed here as well. But then I checked the other doc examples in rOpenScPCA itself and it seems we do not use set.seed() anywhere in these docs but pass in a seed argument directly all around!

I sort of think we want to firm this up and generally present set.seed() in all examples and not the seed argument, but maybe that level of consistency is Stephanie-style overkill. Who has thoughts?

I would remove passing in a seed in the examples (all around I guess), but I would also not show set.seed() in the package code examples. That level of detail makes the most sense to me in the example notebooks, not in individual function docs.

I would remove passing in a seed in the examples (all around I guess),

@cansavvy I can take care of this separately unless you are feeling very eager, up to you!

Go for it! I might not get to it for a few days!

I'm not necessarily either, so I'll make it an issue!

R/evaluate-clusters.R

sjspielman · 2025-01-27T16:57:24Z

R/evaluate-clusters.R

+          df <- calculate_purity(
+            x = x,
+            cluster_df = df,
+            ...


The ... can definitely be kept (6539ac9), you just need a @param line for it in the roxygen.

In this case we should keep it since there are other arguments which might be needed

Yes but we decided in a previous conversation @jashapiro didn't think a ... was ever going to be used so I removed it #23 (comment)

ah, right! delete away.

R/evaluate-clusters.R

sjspielman · 2025-01-27T16:59:15Z

R/evaluate-clusters.R

+#'   algorithm = "walktrap",
+#'   weighting = "jaccard",
+#'   nn = c(10, 15, 25),
+#'   resolution = c(0.75, 1),


this parameter isn't used with walktrap, so we'd want to remove it here or change the algorithm

R/evaluate-clusters.R

sjspielman · 2025-01-27T17:00:37Z

R/evaluate-clusters.R

+#' @param cluster_results A single data frame or list of data frames obtained from
+#'   `rOpenScPCA::sweep_clusters()`. Each data frame in the list should contains


Here is where I would say that the single data frame is typically from rOpenScPCA::calculate_clusters().

sjspielman · 2025-01-27T17:03:08Z

R/evaluate-clusters.R

+#' `rOpenScPCA::calculate_silhouette()` and/or `rOpenScPCA::calculate_purity()`, based on
+#'   `calculate_silhouette()` functions output.


I'm not sure I understand this?

based on calculate_silhouette() functions output.

R/evaluate-clusters.R

Co-authored-by: Stephanie Spielman <[email protected]>

for more information, see https://pre-commit.ci

…savvy/multi_sweep

Co-authored-by: Stephanie Spielman <[email protected]>

…savvy/multi_sweep

cansavvy and others added 2 commits December 18, 2024 09:48

multi eval function

12e072e

[pre-commit.ci] auto fixes from pre-commit.com hooks

52bcb38

for more information, see https://pre-commit.ci

cansavvy commented Dec 18, 2024

View reviewed changes

Merge branch 'main' into cansavvy/multi_sweep

22aa816

cansavvy requested a review from sjspielman December 18, 2024 15:00

jashapiro reviewed Dec 18, 2024

View reviewed changes

R/evaluate-clusters.R Show resolved Hide resolved

jashapiro reviewed Dec 18, 2024

View reviewed changes

sjspielman reviewed Dec 18, 2024

View reviewed changes

cansavvy and others added 11 commits January 24, 2025 13:24

Update R/evaluate-clusters.R

b120d4a

Co-authored-by: Joshua Shapiro <[email protected]>

Update R/evaluate-clusters.R

5b54da8

Co-authored-by: Stephanie Spielman <[email protected]>

Update R/evaluate-clusters.R

51ea688

Co-authored-by: Stephanie Spielman <[email protected]>

Merge branch 'main' into cansavvy/multi_sweep

04237ec

[pre-commit.ci] auto fixes from pre-commit.com hooks

c717d79

for more information, see https://pre-commit.ci

Updates based on reviews

0985f59

Merge remote-tracking branch 'cansavvy/cansavvy/multi_sweep' into can…

b18e960

…savvy/multi_sweep

Throw in some tests

fcd96d2

[pre-commit.ci] auto fixes from pre-commit.com hooks

d1c7274

for more information, see https://pre-commit.ci

Update example

e68286f

Merge remote-tracking branch 'cansavvy/cansavvy/multi_sweep' into can…

b9677fa

…savvy/multi_sweep

cansavvy marked this pull request as ready for review January 24, 2025 18:58

cansavvy and others added 8 commits January 24, 2025 14:12

Add tests!

bc7955c

Merge branch 'cansavvy/tests' into cansavvy/multi_sweep

e1206a2

Put tests in another branch

9e355d5

[pre-commit.ci] auto fixes from pre-commit.com hooks

ccd2b3b

for more information, see https://pre-commit.ci

Update docs

81f3e07

[pre-commit.ci] auto fixes from pre-commit.com hooks

e91eaf3

for more information, see https://pre-commit.ci

Remove artifact test

49351b8

Oh no don't get rid of that file.

d16127e

cansavvy mentioned this pull request Jan 24, 2025

Tests for calculate_cell_cluster_metrics() function #29

Draft

cansavvy and others added 3 commits January 24, 2025 14:38

devtools::document() to please the testthat gods

78b0bd9

Update R/evaluate-clusters.R

ad3e57c

Co-authored-by: Stephanie Spielman <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

e19561a

for more information, see https://pre-commit.ci

cansavvy changed the title ~~DRAFT: calculate_cell_cluster_metrics() function~~ Add calculate_cell_cluster_metrics() function Jan 24, 2025

cansavvy added 2 commits January 24, 2025 14:47

Appease linter

c7870df

Forgot to remove ...

6539ac9

sjspielman reviewed Jan 27, 2025

View reviewed changes

sjspielman mentioned this pull request Jan 28, 2025

Remove seed from examples #30

Open

cansavvy commented Jan 28, 2025

View reviewed changes

R/evaluate-clusters.R Outdated Show resolved Hide resolved

R/evaluate-clusters.R Outdated Show resolved Hide resolved

cansavvy and others added 7 commits January 28, 2025 14:20

Apply suggestions from code review

7b55e3b

Co-authored-by: Stephanie Spielman <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

82dc613

for more information, see https://pre-commit.ci

change algorithm

e3b06e9

Merge remote-tracking branch 'cansavvy/cansavvy/multi_sweep' into can…

81e420f

…savvy/multi_sweep

calculate

aa6c950

Co-authored-by: Stephanie Spielman <[email protected]>

delete set the seed

5b6d053

Merge remote-tracking branch 'cansavvy/cansavvy/multi_sweep' into can…

a02b52f

…savvy/multi_sweep

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add calculate_cell_cluster_metrics() function #23

Add calculate_cell_cluster_metrics() function #23

cansavvy commented Dec 18, 2024

cansavvy Dec 18, 2024

jashapiro Dec 18, 2024

cansavvy Dec 18, 2024

sjspielman Dec 18, 2024

cansavvy Jan 24, 2025

jashapiro Jan 24, 2025

cansavvy commented Dec 18, 2024 •

edited

Loading

jashapiro left a comment

jashapiro Dec 18, 2024

sjspielman left a comment

sjspielman Dec 18, 2024

sjspielman left a comment

sjspielman Jan 27, 2025

sjspielman Jan 27, 2025

jashapiro Jan 27, 2025

sjspielman Jan 27, 2025 •

edited

Loading

cansavvy Jan 27, 2025

sjspielman Jan 28, 2025

sjspielman Jan 27, 2025

cansavvy Jan 28, 2025

sjspielman Jan 28, 2025

sjspielman Jan 27, 2025

sjspielman Jan 27, 2025

sjspielman Jan 27, 2025

		#' @param cluster_results A single data frame or list of data frames obtained from
		#' `rOpenScPCA::sweep_clusters()`. Each data frame in the list should contains

		#' `rOpenScPCA::calculate_silhouette()` and/or `rOpenScPCA::calculate_purity()`, based on
		#' `calculate_silhouette()` functions output.

Add calculate_cell_cluster_metrics() function #23

Are you sure you want to change the base?

Add calculate_cell_cluster_metrics() function #23

Conversation

cansavvy commented Dec 18, 2024

Background

Summary

Requested feedback

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cansavvy commented Dec 18, 2024 • edited Loading

jashapiro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjspielman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjspielman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjspielman Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cansavvy commented Dec 18, 2024 •

edited

Loading

sjspielman Jan 27, 2025 •

edited

Loading