diff --git a/R/calculate-clusters.R b/R/calculate-clusters.R index 9daadb8..0fc26dd 100644 --- a/R/calculate-clusters.R +++ b/R/calculate-clusters.R @@ -1,30 +1,35 @@ #' Calculate graph-based clusters from a provided matrix #' -#' This function is provided to simplify application of bluster package clustering functions on OpenScPCA data. -#' In particular, this function runs `bluster::clusterRows()` with the `bluster::NNGraphParam()` function on a +#' @description This function is provided to simplify application of bluster package +#' clustering functions on OpenScPCA data. In particular, this function runs +#' `bluster::clusterRows()` with the `bluster::NNGraphParam()` function on a #' principal components matrix, provided either directly or via single-cell object. -#' Note that defaults for some arguments may differ from the `bluster::NNGraphParam()` defaults. -#' Specifically, the clustering algorithm defaults to "louvain" and the weighting scheme to "jaccard" -#' to align with common practice in scRNA-seq analysis. -#' -#' @param x An object containing PCs that clustering can be performed in. This can be either a SingleCellExperiment -#' object, a Seurat object, or a matrix where columns are PCs and rows are cells. -#' If a matrix is provided, it must have row names of cell ids (e.g., barcodes). -#' @param algorithm Clustering algorithm to use. Must be one of "louvain" (default), "walktrap", or "leiden". +#' +#' Note that defaults for some arguments may differ from the `bluster::NNGraphParam()` +#' defaults. #' Specifically, the clustering algorithm defaults to "louvain" and +#' the weighting scheme to "jaccard" to align with common practice in scRNA-seq analysis. +#' +#' @param x An object containing PCs that clustering can be performed in. This can be +#' either a SingleCellExperiment object, a Seurat object, or a matrix where columns +#' are PCs and rows are cells. If a matrix is provided, it must have row names of +#' cell ids (e.g., barcodes). +#' @param algorithm Clustering algorithm to use. Must be one of "louvain" (default), +#' "walktrap", or "leiden". #' @param weighting Weighting scheme to use. Must be one of "jaccard" (default), "rank", or "number" #' @param nn Number of nearest neighbors. The default is 10. -#' @param resolution Resolution parameter used by Louvain and Leiden clustering only. Default is 1. -#' @param objective_function Leiden-specific parameter for whether to use the Constant Potts Model ("CPM"; default) -#' or "modularity" -#' @param cluster_args List of additional arguments to pass to the chosen clustering function. -#' Only single values for each argument are supported (no vectors or lists). -#' See `igraph` documentation for details on each clustering function: +#' @param resolution Resolution parameter used by Louvain and Leiden clustering only. +#' The default is 1. +#' @param objective_function Leiden-specific parameter for whether to use the +#' Constant Potts Model ("CPM"; default) or "modularity". +#' @param cluster_args List of additional arguments to pass to the chosen clustering +#' function. Only single values for each argument are supported (no vectors or lists). +#' See `igraph` documentation for details on each clustering function: #' @param threads Number of threads to use. The default is 1. #' @param seed Random seed to set for clustering. #' @param pc_name Name of principal components slot in provided object. -#' This argument is only used if a SingleCellExperiment or Seurat object is provided. -#' If not provided, the SingleCellExperiment object name will default to "PCA" and the -#' Seurat object name will default to "pca". +#' This argument is only used if a SingleCellExperiment or Seurat object is provided. +#' If not provided, the SingleCellExperiment object name will default to "PCA" and the +#' Seurat object name will default to "pca". #' #' @return A data frame of cluster results with columns `cell_id` and `cluster`. #' Additional columns represent algorithm parameters and include at least: `algorithm`, `weighting`, and `nn`. @@ -155,7 +160,7 @@ calculate_clusters <- function( #' Extract a principal components (PC) matrix from either a SingleCellExperiment #' or a Seurat object. #' -#' This function first determines if the provided object is a SingleCellExperiment or +#' @description This function first determines if the provided object is a SingleCellExperiment or #' Seurat object, and then extract the PC matrix. If no name for the PC matrix is provided, #' this function will use "PCA" for SingleCellExperiment objects, and #' "pca" for Seurat objects. diff --git a/R/convert-gene-ids.R b/R/convert-gene-ids.R index dd10d32..ea95280 100644 --- a/R/convert-gene-ids.R +++ b/R/convert-gene-ids.R @@ -1,6 +1,6 @@ #' Convert Ensembl gene ids to gene symbols based on reference gene lists #' -#' The SingleCellExperiment objects produced as part of ScPCA are indexed by +#' @description The SingleCellExperiment objects produced as part of ScPCA are indexed by #' Ensembl gene ids, as those are more stable than gene symbols. However, #' for many applications gene symbols are useful. This function provides #' simple conversion of Ensembl gene ids to gene symbols based on either the @@ -105,7 +105,7 @@ ensembl_to_symbol <- function( #' Set the row names of an ScPCA SingleCellExperiment object to gene symbols #' -#' The SingleCellExperiment objects produced as part of ScPCA are indexed by +#' @description The SingleCellExperiment objects produced as part of ScPCA are indexed by #' Ensembl gene ids, as those are more stable than gene symbols. However, #' for many applications gene symbols are useful. This function converts the #' row names (indexes) of a SingleCellExperiment object to gene symbols based on the diff --git a/R/data.R b/R/data.R index b1f2690..ddbba44 100644 --- a/R/data.R +++ b/R/data.R @@ -3,7 +3,7 @@ #' Conversion table for Ensembl gene ids and gene symbols #' #' -#' This table includes the mapping for gene ids to gene symbols from different +#' @description This table includes the mapping for gene ids to gene symbols from different #' reference genome gene annotation lists. #' Included are the original gene symbols and the modified gene symbols that #' are created when running the `make.unique()` function, as is done when diff --git a/R/evaluate-clusters.R b/R/evaluate-clusters.R index 6592d06..0c0df7f 100644 --- a/R/evaluate-clusters.R +++ b/R/evaluate-clusters.R @@ -1,6 +1,6 @@ #' Calculate the silhouette width of clusters #' -#' This function uses the `bluster::approxSilhouette()` function to calculate the +#' @description This function uses the `bluster::approxSilhouette()` function to calculate the #' silhouette width for a clustering result. These results can be used downstream to #' calculate the average silhouette width, a popular metric for cluster evaluation. #' @@ -73,7 +73,7 @@ calculate_silhouette <- function( #' Calculate the neighborhood purity of clusters #' -#' This function uses the `bluster::neighborPurity()` function to calculate the +#' @description This function uses the `bluster::neighborPurity()` function to calculate the #' neighborhood purity values for a clustering result. #' #' @param x Either a matrix of principal components (PCs), or a SingleCellExperiment @@ -142,7 +142,7 @@ calculate_purity <- function( #' Calculate cluster stability using the Adjusted Rand Index (ARI) #' -#' This function generates and clusters, using provided parameters, bootstrap +#' @description This function generates and clusters, using provided parameters, bootstrap #' replicates calculates the Adjusted Rand Index (ARI) between each set of bootstrapped #' clusters and the original provided clusters. ARI measures similarity between different #' cluster results, where a value of 0 indicates an entirely random relationship between diff --git a/R/make-seurat.R b/R/make-seurat.R index e18c96f..cabfd9d 100644 --- a/R/make-seurat.R +++ b/R/make-seurat.R @@ -1,6 +1,6 @@ #' Convert an SCE object to Seurat #' -#' Converts an ScPCA SingleCellExperiment (SCE) object to Seurat format. This is +#' @description Converts an ScPCA SingleCellExperiment (SCE) object to Seurat format. This is #' primarily a wrapper around Seurat::as.Seurat() with some additional steps to #' include ScPCA metadata and options for converting the feature index from #' Ensembl gene ids to gene symbols. diff --git a/R/sum-duplicate-genes.R b/R/sum-duplicate-genes.R index 44dd071..0f298a1 100644 --- a/R/sum-duplicate-genes.R +++ b/R/sum-duplicate-genes.R @@ -1,6 +1,6 @@ #' Sum counts for genes with duplicate names in a SingleCellExperiment object. #' -#' Genes with the same name are merged by summing their raw expression counts. +#' @description Genes with the same name are merged by summing their raw expression counts. #' When multiple Ensembl gene IDs are associated with the same gene symbol, #' identifier conversion can result in duplicate gene names. This function #' resolves such duplicates by summing the expression values for each duplicate diff --git a/R/sweep-clusters.R b/R/sweep-clusters.R index 62a44dc..2e4b8f8 100644 --- a/R/sweep-clusters.R +++ b/R/sweep-clusters.R @@ -1,16 +1,22 @@ #' Calculate clusters across a set of parameters #' -#' This function can be used to perform reproducible clustering while varying a set of parameters. +#' @description This function can be used to perform reproducible clustering while varying a set of parameters. #' Multiple values can be provided for any of: -#' - The algorithm (`algorithm`) -#' - The weighting scheme (`weighting`) -#' - Number of nearest neighbors (`nn`) -#' - The resolution parameter (`resolution`) -#' - The objective function parameter (`objective_function`) +#' +#' - The algorithm (`algorithm`) +#' +#' - The weighting scheme (`weighting`) +#' +#' - Number of nearest neighbors (`nn`) +#' +#' - The resolution parameter (`resolution`) +#' +#' - The objective function parameter (`objective_function`). #' #' For each algorithm specified, all parameters possible to use with that #' algorithm will be systematically varied. This function does not accept additional #' parameters besides those listed above. +#' #' Note that defaults for some arguments may differ from the `bluster::NNGraphParam()` defaults. #' Specifically, the clustering algorithm defaults to "louvain" and the weighting scheme to "jaccard" #' to align with common practice in scRNA-seq analysis. diff --git a/man/calculate_clusters.Rd b/man/calculate_clusters.Rd index 5e93e9d..d547b47 100644 --- a/man/calculate_clusters.Rd +++ b/man/calculate_clusters.Rd @@ -18,23 +18,26 @@ calculate_clusters( ) } \arguments{ -\item{x}{An object containing PCs that clustering can be performed in. This can be either a SingleCellExperiment -object, a Seurat object, or a matrix where columns are PCs and rows are cells. -If a matrix is provided, it must have row names of cell ids (e.g., barcodes).} +\item{x}{An object containing PCs that clustering can be performed in. This can be +either a SingleCellExperiment object, a Seurat object, or a matrix where columns +are PCs and rows are cells. If a matrix is provided, it must have row names of +cell ids (e.g., barcodes).} -\item{algorithm}{Clustering algorithm to use. Must be one of "louvain" (default), "walktrap", or "leiden".} +\item{algorithm}{Clustering algorithm to use. Must be one of "louvain" (default), +"walktrap", or "leiden".} \item{weighting}{Weighting scheme to use. Must be one of "jaccard" (default), "rank", or "number"} \item{nn}{Number of nearest neighbors. The default is 10.} -\item{resolution}{Resolution parameter used by Louvain and Leiden clustering only. Default is 1.} +\item{resolution}{Resolution parameter used by Louvain and Leiden clustering only. +The default is 1.} -\item{objective_function}{Leiden-specific parameter for whether to use the Constant Potts Model ("CPM"; default) -or "modularity"} +\item{objective_function}{Leiden-specific parameter for whether to use the +Constant Potts Model ("CPM"; default) or "modularity".} -\item{cluster_args}{List of additional arguments to pass to the chosen clustering function. -Only single values for each argument are supported (no vectors or lists). +\item{cluster_args}{List of additional arguments to pass to the chosen clustering +function. Only single values for each argument are supported (no vectors or lists). See `igraph` documentation for details on each clustering function: } \item{threads}{Number of threads to use. The default is 1.} @@ -53,12 +56,14 @@ A data frame of cluster results with columns `cell_id` and `cluster`. and Leiden clustering will further include `objective_function`. } \description{ -This function is provided to simplify application of bluster package clustering functions on OpenScPCA data. -In particular, this function runs `bluster::clusterRows()` with the `bluster::NNGraphParam()` function on a +This function is provided to simplify application of bluster package +clustering functions on OpenScPCA data. In particular, this function runs +`bluster::clusterRows()` with the `bluster::NNGraphParam()` function on a principal components matrix, provided either directly or via single-cell object. -Note that defaults for some arguments may differ from the `bluster::NNGraphParam()` defaults. -Specifically, the clustering algorithm defaults to "louvain" and the weighting scheme to "jaccard" -to align with common practice in scRNA-seq analysis. + +Note that defaults for some arguments may differ from the `bluster::NNGraphParam()` +defaults. #' Specifically, the clustering algorithm defaults to "louvain" and +the weighting scheme to "jaccard" to align with common practice in scRNA-seq analysis. } \examples{ \dontrun{ diff --git a/man/calculate_stability.Rd b/man/calculate_stability.Rd index 145931b..1522d95 100644 --- a/man/calculate_stability.Rd +++ b/man/calculate_stability.Rd @@ -62,8 +62,7 @@ replicates calculates the Adjusted Rand Index (ARI) between each set of bootstra clusters and the original provided clusters. ARI measures similarity between different cluster results, where a value of 0 indicates an entirely random relationship between results, and a value of 1 indicates perfect agreement. -} -\details{ + When assessing stability, you should specify the same clustering parameters here as were used to calculate the original clusters. diff --git a/man/ensembl_to_symbol.Rd b/man/ensembl_to_symbol.Rd index c7f1f9e..da314f8 100644 --- a/man/ensembl_to_symbol.Rd +++ b/man/ensembl_to_symbol.Rd @@ -47,8 +47,7 @@ simple conversion of Ensembl gene ids to gene symbols based on either the ScPCA reference gene list or a 10x reference gene list as used by Cell Ranger. Alternatively, a SingleCellExperiment object with gene ids and gene symbols stored in the row data (as those provided by ScPCA) can be used as the reference. -} -\details{ + The gene symbols can either be made unique (as would be done if read in by Seurat) or left as is. } diff --git a/man/sce_to_seurat.Rd b/man/sce_to_seurat.Rd index b8fc032..b692560 100644 --- a/man/sce_to_seurat.Rd +++ b/man/sce_to_seurat.Rd @@ -39,8 +39,7 @@ Converts an ScPCA SingleCellExperiment (SCE) object to Seurat format. This is primarily a wrapper around Seurat::as.Seurat() with some additional steps to include ScPCA metadata and options for converting the feature index from Ensembl gene ids to gene symbols. -} -\details{ + If present, reduced dimensions from SCE objects will be included, renamed to match Seurat default naming. } diff --git a/man/sce_to_symbols.Rd b/man/sce_to_symbols.Rd index 504c747..1e60579 100644 --- a/man/sce_to_symbols.Rd +++ b/man/sce_to_symbols.Rd @@ -46,8 +46,7 @@ It is also possible to use an alternative reference, such as the default ScPCA reference gene sets or the reference gene sets provided by 10x Genomics for use with Cell Ranger. Values for the 10x-provided 2020 and 2024 references are available. -} -\details{ + By default, duplicate gene symbols are left as is, but can be made unique (as would be done by Seurat) by setting the `unique` argument to TRUE. diff --git a/man/sum_duplicate_genes.Rd b/man/sum_duplicate_genes.Rd index 7fc167c..a5b1869 100644 --- a/man/sum_duplicate_genes.Rd +++ b/man/sum_duplicate_genes.Rd @@ -37,8 +37,7 @@ resolves such duplicates by summing the expression values for each duplicate gene name, which may be justified if the different Ensembl gene IDs share substantial sequence identity, which could make separate quantification of the two genes less reliable. -} -\details{ + The rowData for the summed SingleCellExperiment object is updated to reflect the new set of gene names. In each case, the first row for any duplicated id is retained. This may mean that for gene symbols that correspond to multiple diff --git a/man/sweep_clusters.Rd b/man/sweep_clusters.Rd index 93d79f8..9c93deb 100644 --- a/man/sweep_clusters.Rd +++ b/man/sweep_clusters.Rd @@ -56,16 +56,21 @@ A list of data frames from performing clustering across all parameter combinatio \description{ This function can be used to perform reproducible clustering while varying a set of parameters. Multiple values can be provided for any of: - - The algorithm (`algorithm`) - - The weighting scheme (`weighting`) - - Number of nearest neighbors (`nn`) - - The resolution parameter (`resolution`) - - The objective function parameter (`objective_function`) -} -\details{ + +- The algorithm (`algorithm`) + +- The weighting scheme (`weighting`) + +- Number of nearest neighbors (`nn`) + +- The resolution parameter (`resolution`) + +- The objective function parameter (`objective_function`). + For each algorithm specified, all parameters possible to use with that algorithm will be systematically varied. This function does not accept additional parameters besides those listed above. + Note that defaults for some arguments may differ from the `bluster::NNGraphParam()` defaults. Specifically, the clustering algorithm defaults to "louvain" and the weighting scheme to "jaccard" to align with common practice in scRNA-seq analysis.