[R] scale or not to scale that is the question - prcomp
Duncan Murdoch
murdoch at stats.uwo.ca
Wed Aug 19 14:49:52 CEST 2009
On 19/08/2009 8:31 AM, Petr PIKAL wrote:
> Dear all
>
> here is my data called "rglp"
>
> structure(list(vzorek = structure(1:17, .Label = c("179/1/1",
> "179/2/1", "180/1", "181/1", "182/1", "183/1", "184/1", "185/1",
> "186/1", "187/1", "188/1", "189/1", "190/1", "191/1", "192/1",
> "R310", "R610L"), class = "factor"), iep = c(7.51, 7.79, 5.14,
> 6.35, 5.82, 7.13, 5.95, 7.27, 6.29, 7.5, 7.3, 7.27, 6.46, 6.95,
> 6.32, 6.32, 6.34), skupina = c(7.34, 7.34, 5.14, 6.23, 6.23,
> 7.34, 6.23, 7.34, 6.23, 7.34, 7.34, 7.34, 6.23, 7.34, 6.23, 6.23,
> 6.23), sio2 = c(0.023, 0.011, 0.88, 0.028, 0.031, 0.029, 0.863,
> 0.898, 0.95, 0.913, 0.933, 0.888, 0.922, 0.882, 0.923, 1, 1),
> p2o5 = c(0.78, 0.784, 1.834, 1.906, 1.915, 0.806, 1.863,
> 0.775, 0.817, 0.742, 0.783, 0.759, 0.787, 0.758, 0.783, 3,
> 2), al2o3 = c(5.812, 5.819, 3.938, 5.621, 3.928, 3.901, 5.621,
> 5.828, 4.038, 5.657, 3.993, 5.735, 4.002, 5.728, 4.042, 6,
> 5), dus = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("ano", "ne"), class =
> "factor")), .Names = c("vzorek",
> "iep", "skupina", "sio2", "p2o5", "al2o3", "dus"), class = "data.frame",
> row.names = c(NA,
> -17L))
>
> and I try to do principal component analysis. Here is one without scaling
>
> fit<-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2)
> biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8)
>
> you can see that data make 3 groups according to variables sio2 and dus
> which seems to be reasonable as lowest group has different value of dus =
> "ano" while highest group has low value of sio2.
>
> But when I do the same with scale=T
>
> fit<-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2,
> scale=T)
> biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8)
>
> I get completely different picture which is not possible to interpret in
> such an easy way.
>
> So if anybody can advice me if I shall follow recommendation from help
> page (which says
> The default is FALSE for consistency with S, but in general scaling is
> advisable.
> or if I shall stay with scale = FALSE and with simply interpretable
> result?
I would say the answer depends on the meaning of the variables. In the
unusual case that they are measured in dimensionless units, it might
make sense not to scale. But if you are using arbitrary units of
measurement, do you want your answer to depend on them? For example, if
you change from Kg to mg, the numbers will become much larger, the
variable will contribute much more variance, and it will become a more
important part of the largest principal component. Is that sensible?
Duncan Murdoch
More information about the R-help
mailing list