[R] scale or not to scale that is the question - prcomp

Wed Aug 19 15:25:00 CEST 2009

On 19/08/2009 9:02 AM, Petr PIKAL wrote:
> Thank you
> 
> Duncan Murdoch <murdoch at stats.uwo.ca> napsal dne 19.08.2009 14:49:52:
> 
>> On 19/08/2009 8:31 AM, Petr PIKAL wrote:
>>> Dear all
>>>
> 
> <snip>
> 
>> I would say the answer depends on the meaning of the variables.  In the 
>> unusual case that they are measured in dimensionless units, it might 
>> make sense not to scale.  But if you are using arbitrary units of 
>> measurement, do you want your answer to depend on them?  For example, if 
> 
>> you change from Kg to mg, the numbers will become much larger, the 
>> variable will contribute much more variance, and it will become a more 
>> important part of the largest principal component.  Is that sensible?
> 
> Basically variables are in percentages (all between 0 and 6%) except dus 
> which is present or not present (for the purpose of prcomp transformed to 
> 0/1 by as.numeric:). The only variable which is not such is iep which is 
> basically in range 5-8. So ranges of all variables are quite similar. 
> 
> What surprises me is that biplot without scaling I can interpret by used 
> variables while biplot with scaling is totally different and those two 
> pictures does not match at all. This is what surprised me as I would 
> expected just a small difference between results from those two settings 
> as all numbers are quite comparable and does not differ much.

If you look at the standard deviations in the two cases, I think you can 
see why this happens:

Scaled:

Standard deviations:
[1] 1.3335175 1.2311551 1.0583667 0.7258295 0.2429397

Not Scaled:

Standard deviations:
[1] 1.0030048 0.8400923 0.5679976 0.3845088 0.1531582

The first two sds are close, so small changes to the data will affect 
their direction a lot.  Your biplots look at the 2nd and 3rd components.

Duncan Murdoch