[R] scores for a new observation from PCAgrid() in pcaPP
Kari Ruohonen
kari.ruohonen at utu.fi
Fri Oct 15 14:30:28 CEST 2010
Hi,
I a trying to compute scores for a new observation based on previously
computed PCA by PCAgrid() function in the pcaPP package. My data has
more variables than observations.
Here is an imaginary data set to show the case:
> n.samples<-30
> n.bins<-1000
> x.sim<-rep(0,n.bins)
> V.sim<-diag(n.bins)
> mtx<-array(dim=c(n.samples,n.bins))
> for(i in 1:n.samples) mtx[i,]<-mvrnorm(1,x.sim,V.sim)
With prcomp() I can do the following:
> pc.pr2<-prcomp(mtx,scale=TRUE)
> newscr.pr2<-scale(t(mtx[1,]),pc.pr2$center,pc.pr2$scale)%*%pc.pr2
$rotation
The latter computes the scores for the first row of mtx. I can verify
that the scores are the same as computed originally by comparing with
> pc.pr2$x[1,] # that will print out the scores for the first
observation
Now, if I tried the same with PCAgrid() as follows:
> pc.pp2<-PCAgrid(mtx,k=min(dim(mtx)),scale=mad)
> newscr.pp2<-scale(t(mtx[1,]),pc.pp2$center,pc.pp2$scale)%*%pc.pp2
$loadings
The newscr.pp2 do not match the scores in the pc.pp2 object as can be
verified by comparing with:
> pc.pp2$x[1,]
I wonder what I am missing? Or is it so that for the grid method such
computation of scores from the loadings and original observations is not
possible?
For the case p<n, i.e. when there are more observations than variables,
the scores computed from loadings and the scores from the model object
match also for the PCAgrid() method, i.e. the behaviour described above
seems to relate to cases where p>n.
Many thanks for any help,
Kari
More information about the R-help
mailing list