diff --git a/vignettes/dimension_reduction.Rmd b/vignettes/dimension_reduction.Rmd index ae0348053..691377c3f 100644 --- a/vignettes/dimension_reduction.Rmd +++ b/vignettes/dimension_reduction.Rmd @@ -57,15 +57,27 @@ These methods are typically used early in the analysis pipeline after filtering, - Preserve global structure of the data - Output can be used directly for downstream analyses +**Features to use** + Which features to include when calculating these dimension reductions has a large effect on the information extracted. Highly variable features will focus on variation from features with the largest expression variation. Spatially variable features will focus on features with spatially organized expression. However, when there are not many features (only hundreds of features), it is a better idea to include all features than to use a subset. +**Centering and Scaling** + +These dimension reductions should generally be centered and scaled for the downstream steps, but it is important not to accidentally perform centering and scaling multiple times. Whether it is necessary depends on how the expression information was normalized prior to this step. For Giotto's provided normalization methods from `normalizeGiotto()`: + +- giotto `'standard'` normalization -- needs center and scale (default behavior) +- `'pearson'` (can be considered already centered and scaled) -- do not center and scale again +- `'quantile'` (can be considered already centered and scaled) -- do not center and scale again ## Principal Component Analysis (PCA) A linear dimensionality reduction technique that identifies the directions of maximum variance in high-dimensional data and projects it onto a lower-dimensional subspace. Giotto provides several implementations, but the default is with irlba through BiocSingular. Instead of calculating all PCs, Giotto only calculates the first 100 by default. ```{r, eval=FALSE} -# would 'hvf' (highly variable feats) by default if available, but we pass NULL to `feats_to_use` to use all of them for this mini dataset -g <- runPCA(g, feats_to_use = NULL) +# - runPCA() uses 'hvf' (highly variable feats) by default if available +# but we pass NULL to `feats_to_use` to use all of them for this mini dataset +# - runPCA() uses the 'normalized' expression values by default and performs centering and scaling by default. +# Set `center = FALSE` and `scale = FALSE` if not needed. +g <- runPCA(g, feats_to_use = NULL) dimPlot2D(g, dim_reduction_to_use = "pca") ```