-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Function to generate clustering stats for a set of parameters #10
Comments
Tagging @cansavvy in case you are interested! |
Looking at the proposed output here, I think we might want to have one function that calculates both purity and silhouette width and puts them into a single data frame. If we aren't doing more than just running through the list and producing a new list, I'm not sure these functions would add much clarity beyond the "builtin" way I would process a list, namely with
In practice, I would expect to do something more like the following to facilitate summary stats and faceted plotting.
If we had a Stability would still have to be a separate function, as the ARI there is calculated per bootstrap, not per cell. |
We currently support generating clustering results using a range of parameters with the
sweep_clusters()
function, but the functions incalculate-clusters.R
only support calculating metrics for one set of clustering results. In order to make the plots described in #9 it might be helpful to have a function that calculates one or all the metrics on the clustering output fromsweep-clusters()
.I think this would take the following arguments:
sweep_clusters()
.c("purity", "width")
would run bothcalculate_silhouette()
andcalculate_purity()
on all data frames/ clustering results. Alternatively we could use flags for each metric, width, purity, and stability.The output would be a list of data frames with one data frame for each metric. That means there would be one data frame that contains all the results from the purity calculations for all clustering results that were output from
sweep_clusters()
, one for width, and one for stability. Then these data frames could be provided as input to the function for plotting described in #9.The text was updated successfully, but these errors were encountered: