# [R] R vs SPSS output for princomp

James Howison jhowison at syr.edu
Tue May 6 16:59:38 CEST 2003

On Tuesday, May 6, 2003, at 03:00  AM, Prof Brian Ripley wrote:

> On Mon, 5 May 2003, James Howison wrote:
>
>> I am using R to do a principal components analysis for a class
>> which is generally using SPSS - so some of my question relates to
>> SPSS output (and this might not be the right place).  I have
>> scoured the mailing list and the web but can't get a feel for this.
>> It is annoying because they will be marking to the SPSS output.
>>
>> in SPSS and in R - I suspect that there is some normalization or
>> scaling going on that I don't understand (and there is plenty I
>> don't understand).  The scree-plots (and thus eigen values for each
>> component) and Proportion of Variance figures are identical - but
>> the SPSS loadings are much higher than those shown by R.
>>
>> Should the loadings returned by the R princomp function and the
>> SPSS "Component Matrix" be the same?
>
> Only if they are defined the same.  The length of a PCA loading is
> arbitrary.  R's are of length (sum of squares of coefficients) one:
> how are SPSS's defined?

I believe that, based on the "Factor Score Coefficients" section of the
SPSS algorithm document (am I right in thinking that R's "loadings" are
also "Factor Score coefficients") this is the calculations that SPSS is
using?

http://www.spss.com/tech/stat/Algorithms/11.5/factor.pdf

To quote (in psuedo latex):

The matrix of factor ladings based on factor m is:

\lambda_m = \omega_m {\gamma_m}^{\frac{1}{2}}

where

\omega_m = (w_1,w_2,...,w_m)
\gamma_m = diag(abs{y_1},abs{y_2},....,abs{y_m})

For a correlation matrix

y_1 >= y_2 >= y_2 >= ... >= y_m are the eigenvalues and w_i are the
corresponding eigenvectors of R, where R is the correlation matrix.

(skipping down to the bottom of the document)

example))

W = \lambda_m {\gamma_m}^-1

where
S_m = factor structure matrix and
\lambda_m = S_m for orthogonal rotations

I'm afraid that my mathematical skills are not up to comparing these
algorithm explained in the SPSS document with the R source code :(
Hopefully the difference is obvious to somebody here.

>> And subsidiary question would be:  How does one approximate the
>> "Kaiser's little jiffy" test for extracting the components (SPSS
>> by default eliminates those components with eigen values below 1)?
>> the scree plot (to set x) - but is there another way?
>
> eigen values of what exactly?  The component sdev is the aquare roots
> of
> the eigenvalues of the (possibly scaled) covariance matrix: maybe you
> intend this only for a correlation matrix?

Yes I do - I'm using only the correlation matrix.  I understood that it
was common (following Kaiser's suggestion) to extract only components
which have eigenvalues above 1 (i.e. explain as much variance as at
least one of the input variables).  I understand that is considered
statistically crude but is still common.

I guess I'm expecting an interface for PCA not too dissimilar to that
of factanal (as it is in other statistical packages).  Perhaps there
are sounds statisical reasons for not wanting to hide this step from
the user but perhaps it is interesting to you to know people's
expectations when using the princomp function.

> In R you have the source code, so if you know what you want you can
> find
> the pieces.

Apologies that this is a bit beyond me right at the moment.  I do,
available.

James
Doctoral Student
School of Information Studies
Syracuse University

> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595