# [R] scale or not to scale that is the question - prcomp

Petr PIKAL petr.pikal at precheza.cz
Wed Aug 19 16:14:24 CEST 2009

```Duncan Murdoch <murdoch at stats.uwo.ca> napsal dne 19.08.2009 15:25:00:

> On 19/08/2009 9:02 AM, Petr PIKAL wrote:
> > Thank you
> >
> > Duncan Murdoch <murdoch at stats.uwo.ca> napsal dne 19.08.2009 14:49:52:
> >
> >> On 19/08/2009 8:31 AM, Petr PIKAL wrote:
> >>> Dear all
> >>>
> >
> > <snip>
> >
> >> I would say the answer depends on the meaning of the variables.  In
the
> >> unusual case that they are measured in dimensionless units, it might
> >> make sense not to scale.  But if you are using arbitrary units of
> >> measurement, do you want your answer to depend on them?  For example,
if
> >
> >> you change from Kg to mg, the numbers will become much larger, the
> >> variable will contribute much more variance, and it will become a
more
> >> important part of the largest principal component.  Is that sensible?
> >
> > Basically variables are in percentages (all between 0 and 6%) except
dus
> > which is present or not present (for the purpose of prcomp transformed
to
> > 0/1 by as.numeric:). The only variable which is not such is iep which
is
> > basically in range 5-8. So ranges of all variables are quite similar.
> >
> > What surprises me is that biplot without scaling I can interpret by
used
> > variables while biplot with scaling is totally different and those two

> > pictures does not match at all. This is what surprised me as I would
> > expected just a small difference between results from those two
settings
> > as all numbers are quite comparable and does not differ much.
>
>
> If you look at the standard deviations in the two cases, I think you can

> see why this happens:
>
> Scaled:
>
> Standard deviations:
> [1] 1.3335175 1.2311551 1.0583667 0.7258295 0.2429397
>
> Not Scaled:
>
> Standard deviations:
> [1] 1.0030048 0.8400923 0.5679976 0.3845088 0.1531582
>
>
> The first two sds are close, so small changes to the data will affect

I see. But I would expect that changes to data made by scaling would not
change it in such a way that unscaled and scaled results are completely
different.

> their direction a lot.  Your biplots look at the 2nd and 3rd components.

Yes because grouping in 2nd and 3rd component biplot can be easily
explained by values of some variables (without scaling).

I must admit that I do not use prcomp much often and usually scaling can
give me "explainable" result, especially if I use it to "variable
reduction". Therefore I am reluctant to use it in this case.

when I try "more standard" way

> fit<-lm(iep~sio2+al2o3+p2o5+as.numeric(dus), data=rglp)
> summary(fit)

Call:
lm(formula = iep ~ sio2 + al2o3 + p2o5 + as.numeric(dus), data = rglp)

Residuals:
Min       1Q   Median       3Q      Max
-0.41751 -0.15568 -0.03613  0.20124  0.43046

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)      7.12085    0.62257  11.438 8.24e-08 ***
sio2            -0.67250    0.20953  -3.210 0.007498 **
al2o3            0.40534    0.08641   4.691 0.000522 ***
p2o5            -0.76909    0.11103  -6.927 1.59e-05 ***
as.numeric(dus) -0.64020    0.18101  -3.537 0.004094 **

I get quite plausible result which can be interpreted without problems.

My data is a result of designed experiment (more or less :) and therefore
all variables are significant. Is that the reason why scaling may bye
inappropriate in this case?

Regards
Petr Pikal

>
> Duncan Murdoch

```