Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Oct 3 13:29:28 CEST 2000
> From: Christine Serres <serres at valigen.net>
> I've used the example given in the documentation for the prcomp function
> both in R and SPAD to compare the results obtained.
> Surprisingly, I do not obtain the same results for the coordinates of
> the principal composantes with these two softwares.
> using USArrests data I obtain with R :
> > summary(prcomp(USArrests))
> Importance of components:
> PC1 PC2 PC3 PC4
> Standard deviation 83.732 14.2124 6.4894 2.48279
> Proportion of Variance 0.966 0.0278 0.0058 0.00085
> Cumulative Proportion 0.966 0.9933 0.9991 1.00000
Read on:
> summary(prcomp(USArrests, scale=T))
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.57 0.995 0.5971 0.4164
Proportion of Variance 0.62 0.247 0.0891 0.0434
Cumulative Proportion 0.62 0.868 0.9566 1.0000
> And using SPAD (french editor CISIA) :
> Ex: sd pv cp
> comp1 | 2.4802 | 62.01 | 62.01 |
> comp2 | 0.9898 | 24.74 | 86.75 |
> comp3 | 0.3566 | 8.91 | 95.66 |
> comp4 | 0.1734 | 4.34 | 100.00 |
Also
> summary(princomp(USArrests, cor=T))
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 1.5748783 0.9948694 0.5971291 0.41644938
Proportion of Variance 0.6200604 0.2474413 0.0891408 0.04335752
Cumulative Proportion 0.6200604 0.8675017 0.9566425 1.00000000
BTW, it looks like SPAD's `sd' are in fact variances, for the square
of the first line here is
Comp.1 Comp.2 Comp.3 Comp.4
2.4802416 0.9897652 0.3565632 0.1734301
> Am I wrong using R ? Why the results are so different ?
In this dataset you do want scaling, as the variables are not on a
common scale. But SPAD has apparently scaled by default, and
apparently mis-labelled its results.
> Furthemore could anyone explain me the difference between prcomp and
> princomp, since we do not obtain exxactly the same results using these
> two functions.
They differ in the definition of variance. It's on the help page for princomp!
If you scale, there is no difference, otherwise there is an n vs n-1
factor. The reasons are both S-PLUS compatibility and to allow
princomp to use robust principal components.
> And how to obtain the coordinates of the points on the first composante
> using R ?
predict on a princmp fit, or retx=TRUE on a prcomp fit.
You will find all this in Venables & Ripley, for example.
