Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a warning messaging for when you only got 1 group for your clustering results #22

Open
cansavvy opened this issue Dec 18, 2024 · 1 comment

Comments

@cansavvy
Copy link
Collaborator

Context

Working on #10 and learning how these functions work. They're great!

However upon playing around with the functions I ran the sweep_clusters() in such a way that only one clustering group was identified. This may be because I am new to these functions and how they work. But it did take me a second to figure out what the error I was getting meant.

Reprex:

If you run this code:

set.seed(2024)
sce <- splatter::simpleSimulate(nGenes = 1000, verbose = FALSE) |>
  scater::logNormCounts() |>
  scater::runPCA(ncomponents = 10)

test_mat <- reducedDim(sce, "PCA")

sweep_list <- sweep_clusters(
  test_mat,
  algorithm = "louvain",
  objective_function = "modularity",
  resolution = 0.5
)
calculate_silhouette(
    x = test_mat,
    cluster_df = sweep_list[[1]]
)

You should get an error like:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': different row counts implied by arguments

That is a result of in the evaluate-clusters.R line 53 and 54 that cluster_df[[cluster_col]] that's being passed into bluster::approxSilhouette() is only made up of one level of all 1's 😄

  silhouette_df <- x |>
    bluster::approxSilhouette(cluster_df[[cluster_col]]) |>

Proposed solution

I'm not exactly sure where the appropriate place for a warning is. I would think upon creating a clustering of only one group you'd want to warn someone that this is a result they got. But I could also see putting a warning in the calculate_silhouette() function. Side note the purity function did not throw an error. It didn't mind there was only one group.

I'm happy to put this in here somewhere but just thought I should post about it first and see what ya'll think.

@sjspielman
Copy link
Member

sjspielman commented Dec 18, 2024

Thanks for raising this - I do recall seeing this before that the silhouette width calculation fails* when there is 1 cluster!

I do think we might want to add checks for this to all the evaluation functions, whether they run or not, since this will really help contextualize. Maybe:

  • For silhouette width, we should add a stopifnot() to confirm that there are >1 clusters before doing any calculations, catch the error before it hits.
  • For purity and stability, we should add a warning() if there is only 1 cluster before calculations.

It could be worth a warning (message?) in calculate_clusters() itself too if only 1 cluster was found; this seems like an important piece of information to call out sooner rather than later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants