[R] How to plot PCA output?

Tue May 8 15:02:17 CEST 2012

> [...]

> But having indicated that I don't see a biplot's multiple scales as particularly likely to confuse or mislead, I'm always interested in alternatives. The interesting question is 'given the same objective - a qualitative indication of which variables have most influenced the location of particular data points (or vice versa) and in which general direction - what do you suggest instead?'
> 
> Steve Ellison

Steve is probably looking for answers from others, but if the variables are relatively few, I plot the loadings vs variables for each of the first few PCs, using something like a bar plot which can go positive or negative (not the best ink to data ratio however).  So PC1 loadings vs variable names, PC2 loadings vs variable names etc.

If there are a lot of variables, I use a dot rather than a bar, or more generally, a line (for instance, spectroscopic data where there are PC1 loadings vs thousands of frequencies).

The magnitude and sign of the loading for each variable gives you a sense of the contribution of that variable to the given PC.

I suspect this is not what Steve had in mind (he no doubt knows these things well already) but I'm also always on the lookout for good displays.  Share 'em if you got 'em.

Bryan

i.pca <- prcomp(iris[,1:4])
library("ggplot2")

# plot scores
scores <- as.data.frame(i.pca$x)
qplot(x = PC1, y = PC2, data = scores, geom = "point", col = iris[,5])

# Loadings on PC1 (few variables)

loadings <- as.data.frame(i.pca$rotation)
loadings$var <- colnames(iris[,1:4])
qplot(x = var, y = PC1, data = loadings, geom = "bar")

# Could also use geom = "point" but when there are many variables you may wish to connect the points too.
# Compare to
biplot(i.pca)

And you can see the biplot has some additional information compared to the simple loading plot, but I'd have to dig out exactly what it is and if it is especially useful.