[R] prcomp(X,center=F) ??
Jari Oksanen
jari.oksanen at oulu.fi
Sun Mar 8 16:05:28 CET 2009
Dear Agustin & the Listers,
Noncentred PCA is an old and establishes method. It is rarely used,
but still (methinks) it is used more often than it should be used.
There is nothing wrong in having noncentred PCA in R, and it is a real
PCA. Details will follow.
On 08/03/2009, at 11:07 AM, Agustin Lobo wrote:
> I do not understand, from a PCA point of view, the option center=F
> of prcomp()
>
> According to the help page, the calculation in prcomp() "is done by
> a singular value decomposition of the (centered and possibly scaled)
> data matrix, not by using eigen on the covariance matrix" (as it's
> done by princomp()) .
> "This is generally the preferred method for numerical accuracy"
>
> The question is that while
> prcomp(X,center=T,scaling=F) is equivalent to princomp(X,scaling=F),
> but prcomp(X,center=F) has no equivalent in princomp()
princomp() does not have explicit noncentred analysis because its
algorithm is based on the eigen analysis of covariance matrix using
funciton cov.wt(). While function cov.wt() knows argument
'center' (which is the US spelling of centre), princomp() does not
pass that argument to cov.wt(). However, if you supply a noncentred
'covmat' instead of 'x' (see ?princomp), princomp() will give you
noncentred PCA.
We can only guess why the authors of princomp() did not allow
noncentred analysis. However, I know why the author of rda() function
in vegan did not explicitly allow noncentred analysis: he thought that
it is a Patently Bad Idea(TM). Further, he suggests in vegan function
that if you think you need noncentred analysis, you are able to edit
vegan::rda.default() to do so -- if you cannot edit the function, then
you probably are wrong in assuming you need noncentred analysis. It
may be that the princomp authors think in the same way (but they also
allow noncentred analyis iif you supply a noncentred 'covmat').
One of the sins of my youth was to publish a paper using noncentred
PCA. I'm penitent. (However, I commited bigger sins, too.)
> Also, the rotation made with either the eigenvectors of
> prcomp(X,center=T,scaling=F) or the ones of princomp(X,scaling=F)
> yields PCs with a minimum correlation, as expected
> for a PCA. But the rotation made with the eigenvectors of
> prcomp(X,center=F) yields axes that are correlated.
> Therefore, prcomp(X,center=F) is not really a PCA.
PCA axes are *orthogonal*, but not necessarily uncorrelated (people
often mistake these as synonyms). Centred orthogonal axes also are
uncorrelated, but noncentred orthogonal are (or may be) correlated.
Your example code may be simplified a bit: prcomp returns the rotated
matrix so that you do not need the rotation %*% data multiplication
after analysis: you get that in the analysis. Below I use
multivariate normal random numbers (MASS::mvrnorm) to generate
correlated observations:
library(MASS)
x<- mvrnorm(100, mu = c(1, 3), Sigma=matrix(c(1, 0.6, 0.6, 1), nrow=2))
pc <- prcomp(x, cent=F)
Here the scores are orthogonal:
crossprod(pc$x)
or the off-diagonal elements are numerically zero, but they are
correlated:
cor(pc$x)
The only requirement we have is orthogonality, and uncorrelatedness is
a collateral damage in centred analysis.
Cheers, Jari Oksanen
>
> See the following example, in which the second column of
> data matrix X is linearly correlated to the first column:
>
> > X <- cbind(rnorm(100,100,50),rnorm(100,100,50))
> > X[,2] <- X[,1]*1.5-50 +runif(100,-70,70)
> > plot(X)
> > cor(X[,1],X[,2])
> [1] 0.903597
>
> > eigvnocent <- prcomp(X,center=F,scaling=F)[[1]]
> > eigvcent <- prcomp(X,center=T,scaling=F)[[1]]
> > eigvecnocent <- prcomp(X,center=F,scaling=F)[[2]]
> > eigveccent <- prcomp(X,center=T,scaling=F)[[2]]
>
> > PCnocent <- X%*%eigvecnocent
> > PCcent <- X%*%eigveccent
> > par(mfrow=c(2,2))
> > plot(X)
> > plot(PCnocent)
> > plot(PCcent)
>
> > cor(X[,1],X[,2])
> [1] 0.903597
> > cor(PCcent[,1],PCcent[,2])
> [1] -8.778818e-16
> > cor(PCnocent[,1],PCnocent[,2])
> [1] -0.6908334
> >
>
> Also the help page of prcomp() states:
> "Details
>
> The calculation is done by a singular value decomposition of the
> (centered and possibly scaled) data matrix..."
>
> The parenthesis implies some ambiguity, but I do interpret the
> sentence as indicating that the calculation should always be done
> using a centered data matrix.
> Finally, all the examples in the help page use centering (or
> scaling, which implies centering)
>
> Therefore, why the option center=F ?
There really is a small conflict between docs and function: the
function allows noncentred analysis, and it also performs such an
analysis if asked. This is documented for the argument "center", but
the option is later ignored in the passage you cite. Probably because
the authors of the function think that this offered option should not
be used. You may submit a report of this internal conflict in prcomp()
documents to R maintainers.
Cheers, Jari Oksanen
More information about the R-help
mailing list