[Rd] RE: [R] Mapping actual to expected columns for princomp object

Liaw, Andy andy_liaw at merck.com
Thu Mar 24 14:34:09 CET 2005


[Re-directing to R-devel, as I think this needs changes to the code.]

Can I suggest a modification to stats:predict.princomp so that it will check
for column (variable) names?

In src/library/stats/R/princomp-add.R, insert the following after line 4:

    if (!is.null(cn <- names(object$center))) newdata <- newdata[, cn]

Now Dana's example looks like:

> predict(pca1, frz)
Error in "[.data.frame"(newdata, , names(object$center)) : 
        undefined columns selected
> names(frz) <- c("x2", "x1")
> predict(pca1, frz)
        Comp.1      Comp.2
1  -3.29329963 -1.24675774
2   0.15760569  0.09364550
3   1.90206906  0.06292855
4  -0.92968723  0.64356801
5  -1.15298669  0.25451588
6   0.48466884 -0.87611668
7   0.98602646 -0.52156549
8  -1.53126034 -0.96259529
9  -0.79112984 -1.50831648
10  0.02997392 -0.18888807
> names(frz) <- c("x1", "x2")
> predict(pca1, frz)
        Comp.1      Comp.2
1   2.49603051 -2.42516162
2  -0.15633499  0.15754735
3  -1.77400454  0.81118427
4   1.05941012  0.23869214
5   1.11286213 -0.20669206
6  -0.83645436 -0.60720531
7  -1.15932677 -0.08488413
8   0.98526969 -1.47482877
9   0.09070675 -1.68781215
10 -0.14930067 -0.15239717

Best,
Andy

> From: Dana Honeycutt
> 
> I am working with data sets in which the number and order of columns
> may vary, but each column is uniquely identified by its name.  E.g.,
> one data set might have columns
>         MW logP Num_Rings Num_H_Donors
> while another has columns
>         Num_Rings Num_Atoms Num_H_Donors logP MW
> 
> I would like to be able to perform a principal component 
> analysis (PCA)
> on one data set and save the PCA object to a file.  In a 
> later R session, 
> I would like to load the object and then apply the loadings to a new 
> data set in order to compute the principal component (PC) values for 
> each row of new data.
> 
> I am trying to use the princomp method in R to do this. (I started 
> with prcomp, but found that there is no predict method for objects
> created by prcomp.)  The problem is that when using predict on a
> princomp object, R ignores the names of columns and simply assumes
> that the column order is the same as in the original data frame used
> to do the PCA.  (This contrasts, for example, with the behavior of a
> model produced by lm, which is aware of column names in a data frame.)
> 
> What I think I need to do is this:
> 
> 1. After reloading the princomp object, extract the names and order
> of columns that it expects. (If you look at the loadings for the
> object, you can see that this info is there, but I would like to 
> get at it directly somehow.)
> 
> 2. Reorder the columns in the new data set to correspond to this
> expected order, and remove any extra columns.
> 
> 3. Use the predict method to predict the PC values for the 
> new data set.
> 
> Is this the best approach to achieve what I am attempting?
> 
> If so, can anyone tell me how to accomplish steps 1 and 2 above?
> 
> Thanks,
> Dana Honeycutt
> 
> P.S. Here's a script that demonstrates the problem:
> 
> x1 <- rnorm(10)
> x2 <- rnorm(10)
> y <- rnorm(10)
> 
> frx <- data.frame(x1,x2)
> frxy <- data.frame(x1,x2,y)
> 
> lm1 <- lm(y~x1+x2,frxy)
> pca1 <- princomp(frx)
> 
> rm(x1,x2,y,frx,frxy)
> 
> z1 <- rnorm(10)
> z2 <- rnorm(10)
> frz <- data.frame(z1,z2)
> 
> predict(lm1, frz)  # gives error: Object "x1" not found
> predict(pca1, frz) # gives no error, indicating column names ignored
> 
> z3 <- rnorm(10)
> fr3z <- data.frame(frz,z3)
> predict(pca1,fr3z) # gives error due to unexpected number of columns
> 
> loadings(pca1) # shows linear combos of variables corresponding to PCs
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>



More information about the R-devel mailing list