[R] scale or not to scale that is the question - prcomp
murdoch at stats.uwo.ca
Wed Aug 19 15:25:00 CEST 2009
On 19/08/2009 9:02 AM, Petr PIKAL wrote:
> Thank you
> Duncan Murdoch <murdoch at stats.uwo.ca> napsal dne 19.08.2009 14:49:52:
>> On 19/08/2009 8:31 AM, Petr PIKAL wrote:
>>> Dear all
>> I would say the answer depends on the meaning of the variables. In the
>> unusual case that they are measured in dimensionless units, it might
>> make sense not to scale. But if you are using arbitrary units of
>> measurement, do you want your answer to depend on them? For example, if
>> you change from Kg to mg, the numbers will become much larger, the
>> variable will contribute much more variance, and it will become a more
>> important part of the largest principal component. Is that sensible?
> Basically variables are in percentages (all between 0 and 6%) except dus
> which is present or not present (for the purpose of prcomp transformed to
> 0/1 by as.numeric:). The only variable which is not such is iep which is
> basically in range 5-8. So ranges of all variables are quite similar.
> What surprises me is that biplot without scaling I can interpret by used
> variables while biplot with scaling is totally different and those two
> pictures does not match at all. This is what surprised me as I would
> expected just a small difference between results from those two settings
> as all numbers are quite comparable and does not differ much.
If you look at the standard deviations in the two cases, I think you can
see why this happens:
 1.3335175 1.2311551 1.0583667 0.7258295 0.2429397
 1.0030048 0.8400923 0.5679976 0.3845088 0.1531582
The first two sds are close, so small changes to the data will affect
their direction a lot. Your biplots look at the 2nd and 3rd components.
More information about the R-help