[R] Questions about results from PCAproj for robust principal component analysis

Tue Feb 13 21:33:10 CET 2007

Hi.

I have been looking at the PCAproj function in package pcaPP (R 2.4.1) for 
robust principal components, and I'm trying to interpret the results.  I 
started with a data matrix of dimensions RxC (R is the number of rows / 
observations, C the number of columns / variables).  PCAproj returns a list 
of class princomp, similar to the output of the function princomp.  In a 
case where I can run princomp, I would get the following, from executing  
dmpca = princomp(datamatrix) :
-	the vector, sdev, of length C, contains the standard deviations of the 
components in
		order by descending value; the squares are the eigenvalues of the
		covariance matrix
-	the matrix, loadings, has dimension CxC, and the columns are the 
eigenvectors of the
		covariance matrix, in the same order as the sdev vector; the columns are
		orthonormal:
		sum(dmpca$loadings[,i]*dmpca$loadings[,j]) = 1 if i == j, ~ 0 if i != j
-	the vector, center, of length C, contains the means of the variable 
columns in the original
		data matrix, in the same order as the original columns
-	the vector, scale, of length C, contains the scalings applied to each 
variable, in the same
		order as the original columns
-	n.obs contains the number of observations used in the computation; this 
number equals
		R when there is no missing data
-	the matrix, scores, has dimension RxC, and it can be thought of as the 
projection of the
		eigenvector matrix, loadings, back onto the original data; these columns 
of
		scores are the principal components.  princomp typically removes the mean,
		so the formula is:
		dmpca$scores = t(t(datamatrix) - dmpca$center)%*%dmpca$loadings
		and apply(dmpca$scores,2,mean) returns a length C vector of (effectively)
		zeroes; also the principal components (columns of scores) are orthogonal
		(but not orthonormal):
		sum(dmpca$scores[,i]*dmpca$scores[,j]) ~ 0 if i != j, > 0 if i == j
-	call contains the function call, in this case princomp(x = datamatrix)

That is all as it should be.

In my case R < C, which produces singular results for standard PCA, but 
robust methods, like PCAproj, are designed to handle this.  Also, I had 
"de-meaned" the data beforehand, so apply(datamatrix,2,mean) produces a 
length C vector of (effectively) zeroes.  I ran the following:
dmpcaprj=PCAproj(datamatrix,k=4,CalcMethod="sphere",update=TRUE)
to get the first four robust components.  When I look at the princomp object 
returned as dmpcaprj, some of the results are just what I expect.  For 
example,
-	dmpcaprj$loadings has dimensions Cx4, as expected, and the first four 
eigenvectors of
		the (robust) covariance matrix are orthonormal:
		sum(dmpcaprj$loadings[,i]*dmpcaprj$loadings[,j]) = 1 if i == j, ~ 0 if i 
!= j
-	dmpcaprj$sdev contains the square roots of the four corresponding 
eigenvalues.
-	dmpcaprj$n.obs equals R.
-	dmpcaprj$scores has dimensions Rx4, as it should.

HOWEVER, the columns of dmpcaprj$scores are neither de-meaned nor 
orthogonal.  So,
		apply(dmpcaprj$scores,2,mean) is a non-zero vector, and
		sum(dmpcaprj$scores[,i]*dmpcaprj$scores[,j]) != 0 if i != j, > 0 if i == j
ALSO,
-	dmpcaprj$scale is in this case a vector of all 1's, as expected.  But the 
length is C, not R.
-	dmpcaprj$center is a vector of length C, not R, and the entries are not 
equal to either
		apply(datamatrix,1,mean)  or  apply(datamatrix,2,mean); I can't figure out
		where they came from.

One interesting thing is that the columns of the Rx4 matrix,
		dmpcaprj$scores - datamatrix%*%dmpcaprj$loadings
are all identically constant vectors, such that each row equals 
apply(dmpcaprj$scores,2,mean), since
apply(datamatrix%*%dmpcaprj$loadings,2,mean) is a length four vector of 
(effectively) zeroes,
but I can't interpret the values of these means of dmpcaprj$scores.

Can anyone please explain to me what is happening with the scores, scale, 
and center parts of the PCAproj results?

Thanks!

--  TMK  --
212-460-5430	home
917-656-5351	cell