[R] sdev value returned by princomp function (used for PCA)

peter dalgaard pdalgd at gmail.com
Thu Jun 30 23:02:46 CEST 2011


On Jun 30, 2011, at 21:48 , Coghlan, Avril wrote:

> 
> Dear all,
> 
> I have a question about the 'sdev' value returned by the princomp function (which does principal components analysis).
> 
> On the help page for princomp it says 'sdev' is 'the standard deviations of the principal components'.
> 
> However, when I calculate the principal components for the USArrests data set, I don't find this to be the case:
> 
> Here is how I calculated the principal components and got the 'sdev' values:
>> scaledUSArrests <- scale(USArrests) # standardise the variables to have variance 1 and mean 0
>> myPCA <- princomp(scaledUSArrests)  # do the PCA
>> myPCA$sdev
>   Comp.1    Comp.2    Comp.3    Comp.4 
> 1.5590500 0.9848705 0.5911277 0.4122639 
> 
> As far as I understand, the principal components themselves are stored in the 'scores' value returned by princomp, so I calculated the standard deviations of those values to see if they agree with 'sdev':
>> sd(myPCA$scores)
>   Comp.1    Comp.2    Comp.3    Comp.4 
> 1.5748783 0.9948694 0.5971291 0.4164494 
> 
> I get different values than I got using sd(myPCA$scores) and myPCA$sdev, why is this?
> 
> As there are 4 standardised variables, and the variance of each standardised variable is 1, I would expect the variances of the principal components to add to 1. I find that this is true for the variances calculated using sd(myPCA$scores) but not myPCA$sdev:
>> (1.5590500^2) + (0.9848705^2) + (0.5911277^2) + (0.4122639^2)
>  3.92
>> (1.5748783^2) + (0.9948694^2) + (0.5971291^2) + (0.4164494^2)
>  4.00
> 
> I am wondering why are the values in 'sdev' not equal to the standard deviations of the principal components (as stored in 'scores'), and why is the total variance calculated by summing the squared 'sdev' values not equal to 4?
> 
> Sorry if I have misunderstood something obvious. I would be grateful for help as I'm a bit confused..

Rather, you overlooked something not-quite obvious:

"Note that the default calculation uses divisor N for the covariance matrix."

You'll have to ask the authors for why this convention has been used, but it is of course easy to convert to N-1 divisor once you know what is happening (typically you don't really care about scaling anyway). 

When your data have been scaled with their sd, using N-1 for divisor, the variance using N as divisor will be (1 - 1/N) and true enough:

4 * (1 - 1/50) = 3.92

> 
> Kind regards,
> Avril
> 
> Avril Coghlan
> Cork, Ireland
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list