[R] prcomp(X,center=F) ??

Sun Mar 8 16:05:28 CET 2009

Dear Agustin & the Listers,

Noncentred PCA is an old and establishes method. It is rarely used,  
but still (methinks) it is used more often than it should be used.  
There is nothing wrong in having noncentred PCA in R, and it is a real  
PCA. Details will follow.

On 08/03/2009, at 11:07 AM, Agustin Lobo wrote:

> I do not understand, from a PCA point of view, the option center=F
> of prcomp()
>
> According to the help page, the calculation in prcomp() "is done by  
> a singular value decomposition of the (centered and possibly scaled)  
> data matrix, not by using eigen on the covariance matrix"   (as it's  
> done by princomp()) .
> "This is generally the preferred method for numerical accuracy"
>
> The question is that while
> prcomp(X,center=T,scaling=F) is equivalent to princomp(X,scaling=F),
> but prcomp(X,center=F) has no equivalent in princomp()

princomp() does not have explicit noncentred analysis because its  
algorithm is based on the eigen analysis of covariance matrix using  
funciton cov.wt(). While function cov.wt() knows argument  
'center' (which is the US spelling of centre), princomp() does not  
pass that argument to cov.wt(). However, if you supply a noncentred  
'covmat' instead of 'x' (see ?princomp), princomp() will give you  
noncentred PCA.

We can only guess why the authors of princomp() did not allow  
noncentred analysis. However, I know why the author of rda() function  
in vegan did not explicitly allow noncentred analysis: he thought that  
it is a Patently Bad Idea(TM). Further, he suggests in vegan function  
that if you think you need noncentred analysis, you are able to edit  
vegan::rda.default() to do so -- if you cannot edit the function, then  
you probably are wrong in assuming you need noncentred analysis. It  
may be that the princomp authors think in the same way (but they also  
allow noncentred analyis iif you supply a noncentred 'covmat').

One of the sins of my youth was to publish a paper using noncentred  
PCA. I'm penitent. (However, I commited bigger sins, too.)

> Also, the rotation made with either the eigenvectors of  
> prcomp(X,center=T,scaling=F) or the ones of princomp(X,scaling=F)
> yields PCs with a minimum correlation, as expected
> for a PCA. But the rotation made with the eigenvectors of  
> prcomp(X,center=F) yields axes that are correlated.
> Therefore, prcomp(X,center=F) is not really a PCA.

PCA axes are *orthogonal*, but not necessarily uncorrelated (people  
often mistake these as synonyms). Centred orthogonal axes also are  
uncorrelated, but noncentred orthogonal are (or may be) correlated.

Your example code may be simplified a bit: prcomp returns the rotated  
matrix so that you do not need the rotation %*% data multiplication  
after analysis: you get that in the analysis.  Below I use  
multivariate normal random numbers (MASS::mvrnorm) to generate  
correlated observations:

library(MASS)
x<- mvrnorm(100, mu = c(1, 3), Sigma=matrix(c(1, 0.6, 0.6, 1), nrow=2))
pc <- prcomp(x, cent=F)

Here the scores are orthogonal:

crossprod(pc$x)

or the off-diagonal elements are numerically zero, but they are  
correlated:

cor(pc$x)

The only requirement we have is orthogonality, and uncorrelatedness is  
a collateral damage in centred analysis.

Cheers, Jari Oksanen

>
> See the following example, in which the second column of
> data matrix X is linearly correlated to the first column:
>
> > X <- cbind(rnorm(100,100,50),rnorm(100,100,50))
> > X[,2] <- X[,1]*1.5-50 +runif(100,-70,70)
> > plot(X)
> > cor(X[,1],X[,2])
> [1] 0.903597
>
> > eigvnocent <- prcomp(X,center=F,scaling=F)[[1]]
> > eigvcent <- prcomp(X,center=T,scaling=F)[[1]]
> > eigvecnocent <- prcomp(X,center=F,scaling=F)[[2]]
> > eigveccent <- prcomp(X,center=T,scaling=F)[[2]]
>
> > PCnocent <- X%*%eigvecnocent
> > PCcent <- X%*%eigveccent
> > par(mfrow=c(2,2))
> > plot(X)
> > plot(PCnocent)
> > plot(PCcent)
>
> > cor(X[,1],X[,2])
> [1] 0.903597
> > cor(PCcent[,1],PCcent[,2])
> [1] -8.778818e-16
> > cor(PCnocent[,1],PCnocent[,2])
> [1] -0.6908334
> >
>
> Also the help page of prcomp() states:
> "Details
>
> The calculation is done by a singular value decomposition of the  
> (centered and possibly scaled) data matrix..."
>
> The parenthesis implies some ambiguity, but I do interpret the  
> sentence as indicating that the calculation should always be done  
> using a centered data matrix.
> Finally, all the examples in the help page use centering (or  
> scaling, which implies centering)
>
> Therefore, why the option center=F ?

There really is a small conflict between docs and function: the  
function allows noncentred analysis, and it also performs such an  
analysis if asked. This is documented for the argument "center", but  
the option is later ignored in the passage you cite. Probably because  
the authors of the function think that this offered option should not  
be used. You may submit a report of this internal conflict in prcomp()  
documents to R maintainers.

Cheers, Jari Oksanen