You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Working on #10 and learning how these functions work. They're great!
However upon playing around with the functions I ran the sweep_clusters() in such a way that only one clustering group was identified. This may be because I am new to these functions and how they work. But it did take me a second to figure out what the error I was getting meant.
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': different row counts implied by arguments
That is a result of in the evaluate-clusters.R line 53 and 54 that cluster_df[[cluster_col]] that's being passed into bluster::approxSilhouette() is only made up of one level of all 1's 😄
silhouette_df <- x |>
bluster::approxSilhouette(cluster_df[[cluster_col]]) |>
Proposed solution
I'm not exactly sure where the appropriate place for a warning is. I would think upon creating a clustering of only one group you'd want to warn someone that this is a result they got. But I could also see putting a warning in the calculate_silhouette() function. Side note the purity function did not throw an error. It didn't mind there was only one group.
I'm happy to put this in here somewhere but just thought I should post about it first and see what ya'll think.
The text was updated successfully, but these errors were encountered:
Thanks for raising this - I do recall seeing this before that the silhouette width calculation fails* when there is 1 cluster!
I do think we might want to add checks for this to all the evaluation functions, whether they run or not, since this will really help contextualize. Maybe:
For silhouette width, we should add a stopifnot() to confirm that there are >1 clusters before doing any calculations, catch the error before it hits.
For purity and stability, we should add a warning() if there is only 1 cluster before calculations.
It could be worth a warning (message?) in calculate_clusters() itself too if only 1 cluster was found; this seems like an important piece of information to call out sooner rather than later.
Context
Working on #10 and learning how these functions work. They're great!
However upon playing around with the functions I ran the
sweep_clusters()
in such a way that only one clustering group was identified. This may be because I am new to these functions and how they work. But it did take me a second to figure out what the error I was getting meant.Reprex:
If you run this code:
You should get an error like:
That is a result of in the evaluate-clusters.R line 53 and 54 that
cluster_df[[cluster_col]]
that's being passed intobluster::approxSilhouette()
is only made up of one level of all 1's 😄Proposed solution
I'm not exactly sure where the appropriate place for a warning is. I would think upon creating a clustering of only one group you'd want to warn someone that this is a result they got. But I could also see putting a warning in the
calculate_silhouette()
function. Side note the purity function did not throw an error. It didn't mind there was only one group.I'm happy to put this in here somewhere but just thought I should post about it first and see what ya'll think.
The text was updated successfully, but these errors were encountered: