[R] project test data into principal components of training dataset

olsen o.o.wolf at qmul.ac.uk
Mon Apr 18 18:20:18 CEST 2016

Hi there,

I've a training dataset and a test dataset. My aim is to visually
allocate the test data within the calibrated space reassembled by the
PC's of the training data set, furthermore to keep the training data set
coordinates fixed, so they can serve as ruler for measurement for
additional test datasets coming up.

Please find a minimum working example using the wine dataset below.
Ideally I would like to use ggbiplot as it comes with the elegant
features but it only accepts objects of class prcomp, princomp, PCA, or
lda, which is not fullfilled by the predicted test data.

I'm still slightly wet behind my R ears and the only solution I can
think of is to plot the calibrated space in ggbiplot and the training
data in ggplot and then join them, in the worst case by exporting them
as svg and importing them in inkscape. Which is slightly complicated
plus the scaling is different.

Any indication how this mission can be accomplished very welcome!

Thanks and greets

I started a threat on stackoverflow on that issue but know relevant
indications so far.


##pca on the wine dataset used as training data
wine.pca <- prcomp(wine, center = TRUE, scale. = TRUE)

wine$class <- wine.class

##simulate test data by generating three new wine classes
wine.new.1 <- wine[c(sample(1:nrow(wine), 25)),]
wine.new.2 <- wine[c(sample(1:nrow(wine), 43)),]
wine.new.3 <- wine[c(sample(1:nrow(wine), 36)),]

##Predict PCs for the new classes by transforming
#them using the predict.prcomp function
pred.new.1 <- predict(wine.pca, newdata = wine.new.1)
pred.new.2 <- predict(wine.pca, newdata = wine.new.2)
pred.new.3 <- predict(wine.pca, newdata = wine.new.3)

#simulate the classes for the new sorts
wine.new.1$class <- rep("new.wine.1", nrow(wine.new.1))
wine.new.2$class <- rep("new.wine.2", nrow(wine.new.2))
wine.new.3$class <- rep("new.wine.3", nrow(wine.new.3))
wine.new.bind <- rbind(wine.new.1, wine.new.2, wine.new.3)

##compose the plot by joining the PCA ggbiplot training data with the
testing data from ggplot
#plot the calibrated space resulting from the test data
g.train <- ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups =
wine$class, ellipse = TRUE, circle = TRUE)
#plot the test data resulting from the prediction
df.pred = data.frame(PC1 = wine.new.bind[,1], PC2 = wine.new.bind[,2],
                    PC3 = wine.new.bind[,3], PC4 = wine.new.bind[,4],
                    classes = wine.new.bind$class)
g.test <- ggplot(df.pred, aes(PC1, PC2, color = classes, shape =
classes)) +  geom_point() +  stat_ellipse()

Our solar system is the cream of the crop

More information about the R-help mailing list