[R] Mapping actual to expected columns for princomp object
Dana Honeycutt
dana at accelrys.com
Thu Mar 24 01:09:11 CET 2005
I am working with data sets in which the number and order of columns
may vary, but each column is uniquely identified by its name. E.g.,
one data set might have columns
MW logP Num_Rings Num_H_Donors
while another has columns
Num_Rings Num_Atoms Num_H_Donors logP MW
I would like to be able to perform a principal component analysis (PCA)
on one data set and save the PCA object to a file. In a later R session,
I would like to load the object and then apply the loadings to a new
data set in order to compute the principal component (PC) values for
each row of new data.
I am trying to use the princomp method in R to do this. (I started
with prcomp, but found that there is no predict method for objects
created by prcomp.) The problem is that when using predict on a
princomp object, R ignores the names of columns and simply assumes
that the column order is the same as in the original data frame used
to do the PCA. (This contrasts, for example, with the behavior of a
model produced by lm, which is aware of column names in a data frame.)
What I think I need to do is this:
1. After reloading the princomp object, extract the names and order
of columns that it expects. (If you look at the loadings for the
object, you can see that this info is there, but I would like to
get at it directly somehow.)
2. Reorder the columns in the new data set to correspond to this
expected order, and remove any extra columns.
3. Use the predict method to predict the PC values for the new data set.
Is this the best approach to achieve what I am attempting?
If so, can anyone tell me how to accomplish steps 1 and 2 above?
Thanks,
Dana Honeycutt
P.S. Here's a script that demonstrates the problem:
x1 <- rnorm(10)
x2 <- rnorm(10)
y <- rnorm(10)
frx <- data.frame(x1,x2)
frxy <- data.frame(x1,x2,y)
lm1 <- lm(y~x1+x2,frxy)
pca1 <- princomp(frx)
rm(x1,x2,y,frx,frxy)
z1 <- rnorm(10)
z2 <- rnorm(10)
frz <- data.frame(z1,z2)
predict(lm1, frz) # gives error: Object "x1" not found
predict(pca1, frz) # gives no error, indicating column names ignored
z3 <- rnorm(10)
fr3z <- data.frame(frz,z3)
predict(pca1,fr3z) # gives error due to unexpected number of columns
loadings(pca1) # shows linear combos of variables corresponding to PCs
More information about the R-help
mailing list