-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
projection (prediction) and vector sum (interpolation) geoms #4
Comments
First of all, thank you for this fantastic package (as well as for ggalluvial!). A dropped perpendicular feature would be a great addition: combined with the calibrated axes the perpendiculars really help explain the meaning of a biplot. I've been creating them with calibrate package in base R and have taken a stab at using calibrate calculations in ggplot2 but of course it would be nice to have a smoother interface. library(calibrate)
#> Loading required package: MASS
df <- data.frame(x = c(5, 8, 9), y = c(4, 13, 6))
pca <- prcomp(df)
scores <- pca$x
loadings <- pca$rotation
axisrange <- max(df[,"x"]) - min(df[,"x"])
ticklab <- round(seq(min(df[,"x"])-.5*axisrange,
max(df[,"x"])+.5*axisrange,
by = axisrange/2))
plot(scores, pch = 16, asp = 1, ylim = c(-5, 5))
text(x = scores[,1]-.5, y = scores[,2], labels = 1:nrow(scores))
c <- calibrate(g = loadings[,"PC1"],
y = df[,"x"] - mean(df[,"x"]),
tm = ticklab - mean(df[,"x"]),
Fr = scores,
tmlab = ticklab,
tl = .2,
lm = TRUE,
axislab="x",
where = 1,
labpos = 4,
dp = TRUE)
#> ---------- Calibration Results for x ---------------------
#> Length of 1 unit of the original variable = 1
#> Angle = 76.3 degrees
#> Optimal calibration factor = 1
#> Used calibration factor = 1
#> Goodness-of-fit = 1
#> Goodness-of-scale = 1
#> ------------------------------------------------------------
arrows(0, 0, loadings[,1], loadings[,2], length = .1, col = "red",
lwd = 1.5) Created on 2021-08-25 by the reprex package (v2.0.1) library(calibrate)
#> Loading required package: MASS
library(ggplot2)
df <- data.frame(x = c(5, 8, 9), y = c(4, 13, 6))
pca <- prcomp(df)
scores <- pca$x
loadings <- pca$rotation
axisrange <- max(df[,"x"]) - min(df[,"x"])
ticklab <- round(seq(min(df[,"x"])-.5*axisrange,
max(df[,"x"])+.5*axisrange,
by = axisrange/2))
c <- calibrate(g = loadings[,"PC1"],
y = df[,"x"] - mean(df[,"x"]),
tm = ticklab - mean(df[,"x"]),
Fr = scores,
tmlab = ticklab,
tl = .3,
graphics = FALSE)
#> ---------- Calibration Results for ----------------------
#> Length of 1 unit of the original variable = 1
#> Angle = 76.3 degrees
#> Optimal calibration factor = 1
#> Used calibration factor = 1
#> Goodness-of-fit = 1
#> Goodness-of-scale = 1
#> ------------------------------------------------------------
dfpoints <- data.frame(scores)
dfpoints$xsdrop <- c$Fpr[,1]
dfpoints$ysdrop <- c$Fpr[,2]
dfaxis <- data.frame(x = c$M[1, 1], y = c$M[1, 2],
xend = c$M[nrow(c$M),1], yend = c$M[nrow(c$M), 2])
dfticks <- data.frame(c$M, c$Mn, ticklab)
colnames(dfticks) <- c("x", "y", "xend", "yend", "label")
dfarrows <- data.frame(xend = loadings[,1], yend = loadings[,2])
ggplot(dfpoints, aes(x = PC1, y = PC2)) +
geom_point() +
geom_segment(data = dfticks, aes(x = x, y = y, xend = xend, yend = yend), col = "blue") +
geom_text(data = dfticks, aes(x = xend, y = yend, label = label), nudge_x = .3) +
geom_segment(data = dfaxis, aes(x = x, y = y, xend = xend, yend = yend), col = "blue") +
geom_segment(data = dfpoints, aes(x = PC1, y = PC2, xend = xsdrop, yend = ysdrop),
lty = "dashed", col = "cornflowerblue") +
geom_segment(data = dfarrows, aes(x = 0, y = 0, xend = xend, yend = yend),
arrow = arrow(length = unit(.03, "npc")), color = "red", lwd = 1.5) + coord_equal() Created on 2021-08-25 by the reprex package (v2.0.1) The closest I can get with ordr: library(ggplot2)
library(ggbiplot)
#> Loading required package: plyr
#> Loading required package: scales
#> Loading required package: grid
library(ordr)
#>
#> Attaching package: 'ordr'
#> The following object is masked from 'package:ggbiplot':
#>
#> ggbiplot
df <- data.frame(x = c(5, 8, 9), y = c(4, 13, 6))
pca_ordr <- ordinate(df, cols = 1:2, model = ~ prcomp(., scale. = TRUE))
ggbiplot(pca_ordr) +
xlim(c(-2, 2)) +
ylim(c(-2, 2)) +
geom_rows_point() +
geom_rows_text(aes(label = 1:5), nudge_x = .2) +
geom_cols_vector(color = "red", lwd = 1.5) +
geom_cols_axis() +
geom_cols_axis_ticks(aes(center = .center, scale = .scale)) +
geom_cols_axis_text(aes(center = .center, scale = .scale)) +
geom_cols_axis_label(aes(label = .name))
#> Warning: Ignoring unknown aesthetics: center, scale
#> Warning: Ignoring unknown aesthetics: center, scale
#> Warning: Ignoring unknown aesthetics: label Created on 2021-08-25 by the reprex package (v2.0.1) I should also note that it would be helpful to be able to plot one axis at a time since the graph gets messy with multiple axes, though that might be challenge in the ggplot2 framework. |
@jtr13 thank you for these details! I'm going to reassess what features should go into a first release, and you've got me thinking that the projection geom is one that should be put back on that list. (Meanwhile, the Ditto the ability to specify what row or column elements should be plotted by each layer, as you mention at the end. An experimental solution is the |
Correction and clarification: Thanks to Gower &al's books, i should have remembered that projection onto an axis is a prediction step that calls for a prediction biplot, whereas vector addition along the axes is an interpolation step that makes use of the (currently implemented) interpolation biplot. (Interpolation axis unit vectors are located analogously to case markers, whereas prediction axes must be rescaled in the linear setting and completely redrawn in the non-linear setting.) This issue should really be about rendering both sets of steps, each being appropriate for one type of biplot. Vector addition should come first, for the first CRAN release. |
In the conventional use of PCA with rows of cases and columns of variables, the vector sum geom could look like It shouldn't be much extra work to include an option |
Experimental work is ongoing in the |
Update: Based on further reading, and on the counterintuitiveness and complexity of the experimental implementation, i think both of these annotations should work as follows:
The reason for a change in approach is that the 'tbl_ord' class is designed for display (print, summary, annotation, plot), and I think this deserves to wait until feedback is collected from a first CRAN release. A compensatory benefit will be the new generics and methods themselves, which might be used in base R biplots or for other purposes. To my knowledge, there are no standalone implementations of them; they are only built inaccessibly into visualization tools like those of Gower &al. |
The I think doing this properly—specifically, determining the offset and the extent of each axis from the range of plotted elements from the other matrix factor rather than by trial and error—will come with the registration/pronoun solution or some other trick to access the model from the plot build. |
I will check it out! |
@jtr13 i'm drafting what i hope will be a legit solution in the |
As of 13cad83, the Reminder to self: This work toward delimited and offset calibrated axes was only a prelude to projection and vector sum graphical elements, so this issue will remain open until those are implemented. |
This issue might be resolved alongside #64. Solutions to both are being drafted in A standalone (non-'tbl_ord') convenience parent
Once these tasks are done, the infrastructure will be ready to support |
Tentatively resolved in the Note: Interpolation should not require the referent trick, so the |
|
@jtr13 i believe this issue is resolved with today's merge. As you have the bandwidth, i'd be glad to know if the following new layers and shortcuts suit your needs—or, if not, what remains to be done. Thanks a lot for your continued input! *Should perhaps be reverted to |
Sounds good! I can't wait to finish end-of-semester grading and other admin stuff so I can take a look at this! |
The new additions are great! I've experimented with
Questions:
A few really minor comments:
The more I explore the package the more I realize how much there is that I haven't tried! It is an impressive project. |
Thank you! It's been so rewarding to work on for its own sake but it's another order of magnitude to see it used. Questions:
Comments:
|
@jtr13 FYI the |
Thanks so much for the detailed explanations! I do have the "Referential stats" section -- sorry for the confusion on that. And I missed that |
A geom layer$U$ to the linear subspace containing those of the $V$ , maybe with an optional cute right angle symbol. Both $U$ .
geom_u_projection(from = i, to = j)
should render a (by default dashed) line from the ordinates of thei
th row ofj
th row offrom
andto
could contain multiple indices. The projection should adopt the axes (primary or secondary) used byThe text was updated successfully, but these errors were encountered: