[Rd] RE: [R] Mapping actual to expected columns for princomp object
Liaw, Andy
andy_liaw at merck.com
Thu Mar 24 14:34:09 CET 2005
[Re-directing to R-devel, as I think this needs changes to the code.]
Can I suggest a modification to stats:predict.princomp so that it will check
for column (variable) names?
In src/library/stats/R/princomp-add.R, insert the following after line 4:
if (!is.null(cn <- names(object$center))) newdata <- newdata[, cn]
Now Dana's example looks like:
> predict(pca1, frz)
Error in "[.data.frame"(newdata, , names(object$center)) :
undefined columns selected
> names(frz) <- c("x2", "x1")
> predict(pca1, frz)
Comp.1 Comp.2
1 -3.29329963 -1.24675774
2 0.15760569 0.09364550
3 1.90206906 0.06292855
4 -0.92968723 0.64356801
5 -1.15298669 0.25451588
6 0.48466884 -0.87611668
7 0.98602646 -0.52156549
8 -1.53126034 -0.96259529
9 -0.79112984 -1.50831648
10 0.02997392 -0.18888807
> names(frz) <- c("x1", "x2")
> predict(pca1, frz)
Comp.1 Comp.2
1 2.49603051 -2.42516162
2 -0.15633499 0.15754735
3 -1.77400454 0.81118427
4 1.05941012 0.23869214
5 1.11286213 -0.20669206
6 -0.83645436 -0.60720531
7 -1.15932677 -0.08488413
8 0.98526969 -1.47482877
9 0.09070675 -1.68781215
10 -0.14930067 -0.15239717
Best,
Andy
> From: Dana Honeycutt
>
> I am working with data sets in which the number and order of columns
> may vary, but each column is uniquely identified by its name. E.g.,
> one data set might have columns
> MW logP Num_Rings Num_H_Donors
> while another has columns
> Num_Rings Num_Atoms Num_H_Donors logP MW
>
> I would like to be able to perform a principal component
> analysis (PCA)
> on one data set and save the PCA object to a file. In a
> later R session,
> I would like to load the object and then apply the loadings to a new
> data set in order to compute the principal component (PC) values for
> each row of new data.
>
> I am trying to use the princomp method in R to do this. (I started
> with prcomp, but found that there is no predict method for objects
> created by prcomp.) The problem is that when using predict on a
> princomp object, R ignores the names of columns and simply assumes
> that the column order is the same as in the original data frame used
> to do the PCA. (This contrasts, for example, with the behavior of a
> model produced by lm, which is aware of column names in a data frame.)
>
> What I think I need to do is this:
>
> 1. After reloading the princomp object, extract the names and order
> of columns that it expects. (If you look at the loadings for the
> object, you can see that this info is there, but I would like to
> get at it directly somehow.)
>
> 2. Reorder the columns in the new data set to correspond to this
> expected order, and remove any extra columns.
>
> 3. Use the predict method to predict the PC values for the
> new data set.
>
> Is this the best approach to achieve what I am attempting?
>
> If so, can anyone tell me how to accomplish steps 1 and 2 above?
>
> Thanks,
> Dana Honeycutt
>
> P.S. Here's a script that demonstrates the problem:
>
> x1 <- rnorm(10)
> x2 <- rnorm(10)
> y <- rnorm(10)
>
> frx <- data.frame(x1,x2)
> frxy <- data.frame(x1,x2,y)
>
> lm1 <- lm(y~x1+x2,frxy)
> pca1 <- princomp(frx)
>
> rm(x1,x2,y,frx,frxy)
>
> z1 <- rnorm(10)
> z2 <- rnorm(10)
> frz <- data.frame(z1,z2)
>
> predict(lm1, frz) # gives error: Object "x1" not found
> predict(pca1, frz) # gives no error, indicating column names ignored
>
> z3 <- rnorm(10)
> fr3z <- data.frame(frz,z3)
> predict(pca1,fr3z) # gives error due to unexpected number of columns
>
> loadings(pca1) # shows linear combos of variables corresponding to PCs
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>
More information about the R-devel
mailing list