[R] prcomp(X,center=F) ??

Agustin Lobo aloboaleu at gmail.com
Sun Mar 8 10:07:49 CET 2009


I do not understand, from a PCA point of view, the option center=F
of prcomp()

According to the help page, the calculation in prcomp() "is done by a 
singular value decomposition of the (centered and possibly scaled) data 
matrix, not by using eigen on the covariance matrix"   (as it's done by 
princomp()) .
"This is generally the preferred method for numerical accuracy"

The question is that while
prcomp(X,center=T,scaling=F) is equivalent to princomp(X,scaling=F),
but prcomp(X,center=F) has no equivalent in princomp()

Also, the rotation made with either the eigenvectors of 
prcomp(X,center=T,scaling=F) or the ones of princomp(X,scaling=F)
yields PCs with a minimum correlation, as expected
for a PCA. But the rotation made with the eigenvectors of 
prcomp(X,center=F) yields axes that are correlated.
Therefore, prcomp(X,center=F) is not really a PCA.

See the following example, in which the second column of
data matrix X is linearly correlated to the first column:

 > X <- cbind(rnorm(100,100,50),rnorm(100,100,50))
 > X[,2] <- X[,1]*1.5-50 +runif(100,-70,70)
 > plot(X)
 > cor(X[,1],X[,2])
[1] 0.903597

 > eigvnocent <- prcomp(X,center=F,scaling=F)[[1]]
 > eigvcent <- prcomp(X,center=T,scaling=F)[[1]]
 > eigvecnocent <- prcomp(X,center=F,scaling=F)[[2]]
 > eigveccent <- prcomp(X,center=T,scaling=F)[[2]]

 > PCnocent <- X%*%eigvecnocent
 > PCcent <- X%*%eigveccent
 > par(mfrow=c(2,2))
 > plot(X)
 > plot(PCnocent)
 > plot(PCcent)

 > cor(X[,1],X[,2])
[1] 0.903597
 > cor(PCcent[,1],PCcent[,2])
[1] -8.778818e-16
 > cor(PCnocent[,1],PCnocent[,2])
[1] -0.6908334
 >

Also the help page of prcomp() states:
"Details

The calculation is done by a singular value decomposition of the 
(centered and possibly scaled) data matrix..."

The parenthesis implies some ambiguity, but I do interpret the sentence 
as indicating that the calculation should always be done using a 
centered data matrix.
Finally, all the examples in the help page use centering (or scaling, 
which implies centering)

Therefore, why the option center=F ?

Agus



Agus

-- 
Dr. Agustin Lobo
Institut de Ciencies de la Terra "Jaume Almera" (CSIC)
LLuis Sole Sabaris s/n
08028 Barcelona
Spain
Tel. 34 934095410
Fax. 34 934110012
email: Agustin.Lobo at ija.csic.es
http://www.ija.csic.es/gt/obster




More information about the R-help mailing list