[R] Questions about results from PCAproj for robust principal component analysis
Talbot Katz
topkatz at msn.com
Wed Feb 14 21:22:55 CET 2007
Professor Filzmoser.
Thank you so much for the detailed response. It is very helpful.
-- TMK --
212-460-5430 home
917-656-5351 cell
>From: Peter Filzmoser <P.Filzmoser at tuwien.ac.at>
>To: Talbot Katz <topkatz at msn.com>
>CC: r-help at stat.math.ethz.ch
>Subject: Re: Questions about results from PCAproj for robust principal
>component analysis
>Date: Wed, 14 Feb 2007 20:43:13 +0100
>
>Hi,
>
>PCAproj is mainly designed for robust PCA and not for classical PCA.
>Therefore, when applying classical estimators to the results of a
>robust PCA, like the mean to the robust PCA scores, this will usually
>not give zeros. The robust PCs have been centred robustly, and
>not classically by the mean.
>In your case you ran PCAproj with the default method="mad", thus
>robust PCA was performed, maximizing the mad instead of the usual
>variance (standard deviation). Moreover, you used the default for
>centring the PCs, which is center="l1median", so a robust centring
>rather than centring by the classical mean (which would have been
>the zero vector because your data were already classically centred).
>Run the procedure with the options
>center=mean and method="sd"
>which then gives classical PCA. The mean of the resulting PCA
>scores will be 0. However, the scores will not be orthogonal
>(or uncorrelated), but close to. The reason is that PCAproj
>is an axxproximative algorithm for finding the (robust) PCs.
>The eigen-analysis of princomp gives the exact solution (but
>it is not robust).
>
>The wrong length of result$scale and result$center is definitely
>an error in the procedure that I will have to change quickly.
>For data sets with more columns than rows we automatically
>apply a singular value decomposition (SVD) to reduce the
>dimensionality (without information loss). Then we perform
>centring and scaling in the space of reduced dimension, but
>we did not back-transform these values - sorry. Will be repaired
>soon.
>
>Best regards,
>Peter
>
>
>Talbot Katz wrote:
>>Hi.
>>
>>I have been looking at the PCAproj function in package pcaPP (R 2.4.1) for
>>robust principal components, and I'm trying to interpret the results. I
>>started with a data matrix of dimensions RxC (R is the number of rows /
>>observations, C the number of columns / variables). PCAproj returns a
>>list of class princomp, similar to the output of the function princomp.
>>In a case where I can run princomp, I would get the following, from
>>executing dmpca = princomp(datamatrix) :
>>- the vector, sdev, of length C, contains the standard deviations of
>>the components in
>> order by descending value; the squares are the eigenvalues of the
>> covariance matrix
>>- the matrix, loadings, has dimension CxC, and the columns are the
>>eigenvectors of the
>> covariance matrix, in the same order as the sdev vector; the
>>columns are
>> orthonormal:
>> sum(dmpca$loadings[,i]*dmpca$loadings[,j]) = 1 if i == j, ~ 0 if
>>i != j
>>- the vector, center, of length C, contains the means of the variable
>>columns in the original
>> data matrix, in the same order as the original columns
>>- the vector, scale, of length C, contains the scalings applied to each
>>variable, in the same
>> order as the original columns
>>- n.obs contains the number of observations used in the computation;
>>this number equals
>> R when there is no missing data
>>- the matrix, scores, has dimension RxC, and it can be thought of as
>>the projection of the
>> eigenvector matrix, loadings, back onto the original data; these
>>columns of
>> scores are the principal components. princomp typically removes
>>the mean,
>> so the formula is:
>> dmpca$scores = t(t(datamatrix) - dmpca$center)%*%dmpca$loadings
>> and apply(dmpca$scores,2,mean) returns a length C vector of
>>(effectively)
>> zeroes; also the principal components (columns of scores) are
>>orthogonal
>> (but not orthonormal):
>> sum(dmpca$scores[,i]*dmpca$scores[,j]) ~ 0 if i != j, > 0 if i ==
>>j
>>- call contains the function call, in this case princomp(x =
>>datamatrix)
>>
>>That is all as it should be.
>>
>>
>>In my case R < C, which produces singular results for standard PCA, but
>>robust methods, like PCAproj, are designed to handle this. Also, I had
>>"de-meaned" the data beforehand, so apply(datamatrix,2,mean) produces a
>>length C vector of (effectively) zeroes. I ran the following:
>>dmpcaprj=PCAproj(datamatrix,k=4,CalcMethod="sphere",update=TRUE)
>>to get the first four robust components. When I look at the princomp
>>object returned as dmpcaprj, some of the results are just what I expect.
>>For example,
>>- dmpcaprj$loadings has dimensions Cx4, as expected, and the first four
>>eigenvectors of
>> the (robust) covariance matrix are orthonormal:
>> sum(dmpcaprj$loadings[,i]*dmpcaprj$loadings[,j]) = 1 if i == j, ~
>>0 if i != j
>>- dmpcaprj$sdev contains the square roots of the four corresponding
>>eigenvalues.
>>- dmpcaprj$n.obs equals R.
>>- dmpcaprj$scores has dimensions Rx4, as it should.
>>
>>HOWEVER, the columns of dmpcaprj$scores are neither de-meaned nor
>>orthogonal. So,
>> apply(dmpcaprj$scores,2,mean) is a non-zero vector, and
>> sum(dmpcaprj$scores[,i]*dmpcaprj$scores[,j]) != 0 if i != j, > 0
>>if i == j
>>ALSO,
>>- dmpcaprj$scale is in this case a vector of all 1's, as expected. But
>>the length is C, not R.
>>- dmpcaprj$center is a vector of length C, not R, and the entries are
>>not equal to either
>> apply(datamatrix,1,mean) or apply(datamatrix,2,mean); I can't
>>figure out
>> where they came from.
>>
>>One interesting thing is that the columns of the Rx4 matrix,
>> dmpcaprj$scores - datamatrix%*%dmpcaprj$loadings
>>are all identically constant vectors, such that each row equals
>>apply(dmpcaprj$scores,2,mean), since
>>apply(datamatrix%*%dmpcaprj$loadings,2,mean) is a length four vector of
>>(effectively) zeroes,
>>but I can't interpret the values of these means of dmpcaprj$scores.
>>
>>
>>Can anyone please explain to me what is happening with the scores, scale,
>>and center parts of the PCAproj results?
>>
>>
>>Thanks!
>>
>>
>>-- TMK --
>>212-460-5430 home
>>917-656-5351 cell
>>
>>
>>
>>
>
>
>--
>-------------------------------------------------------
>From: Prof. Dr. Peter Filzmoser
> Dept. of Statistics & Probability Theory
> Vienna University of Technology
> Wiedner Hauptstrasse 8-10
> A-1040 Vienna, Austria
> Tel. +43 1 58801/10733
> Fax. +43 1 58801/10799
> E-mail: P.Filzmoser at tuwien.ac.at
> Internet:
> http://www.statistik.tuwien.ac.at/public/filz/
>-------------------------------------------------------
>
>
>
More information about the R-help
mailing list