[R] scale or not to scale that is the question - prcomp
Duncan Murdoch
murdoch at stats.uwo.ca
Wed Aug 19 15:25:00 CEST 2009
On 19/08/2009 9:02 AM, Petr PIKAL wrote:
> Thank you
>
> Duncan Murdoch <murdoch at stats.uwo.ca> napsal dne 19.08.2009 14:49:52:
>
>> On 19/08/2009 8:31 AM, Petr PIKAL wrote:
>>> Dear all
>>>
>
> <snip>
>
>> I would say the answer depends on the meaning of the variables. In the
>> unusual case that they are measured in dimensionless units, it might
>> make sense not to scale. But if you are using arbitrary units of
>> measurement, do you want your answer to depend on them? For example, if
>
>> you change from Kg to mg, the numbers will become much larger, the
>> variable will contribute much more variance, and it will become a more
>> important part of the largest principal component. Is that sensible?
>
> Basically variables are in percentages (all between 0 and 6%) except dus
> which is present or not present (for the purpose of prcomp transformed to
> 0/1 by as.numeric:). The only variable which is not such is iep which is
> basically in range 5-8. So ranges of all variables are quite similar.
>
> What surprises me is that biplot without scaling I can interpret by used
> variables while biplot with scaling is totally different and those two
> pictures does not match at all. This is what surprised me as I would
> expected just a small difference between results from those two settings
> as all numbers are quite comparable and does not differ much.
If you look at the standard deviations in the two cases, I think you can
see why this happens:
Scaled:
Standard deviations:
[1] 1.3335175 1.2311551 1.0583667 0.7258295 0.2429397
Not Scaled:
Standard deviations:
[1] 1.0030048 0.8400923 0.5679976 0.3845088 0.1531582
The first two sds are close, so small changes to the data will affect
their direction a lot. Your biplots look at the 2nd and 3rd components.
Duncan Murdoch
More information about the R-help
mailing list