[R] scale or not to scale that is the question - prcomp

Petr PIKAL petr.pikal at precheza.cz
Wed Aug 19 14:31:23 CEST 2009


Dear all

here is my data called "rglp"

structure(list(vzorek = structure(1:17, .Label = c("179/1/1", 
"179/2/1", "180/1", "181/1", "182/1", "183/1", "184/1", "185/1", 
"186/1", "187/1", "188/1", "189/1", "190/1", "191/1", "192/1", 
"R310", "R610L"), class = "factor"), iep = c(7.51, 7.79, 5.14, 
6.35, 5.82, 7.13, 5.95, 7.27, 6.29, 7.5, 7.3, 7.27, 6.46, 6.95, 
6.32, 6.32, 6.34), skupina = c(7.34, 7.34, 5.14, 6.23, 6.23, 
7.34, 6.23, 7.34, 6.23, 7.34, 7.34, 7.34, 6.23, 7.34, 6.23, 6.23, 
6.23), sio2 = c(0.023, 0.011, 0.88, 0.028, 0.031, 0.029, 0.863, 
0.898, 0.95, 0.913, 0.933, 0.888, 0.922, 0.882, 0.923, 1, 1), 
    p2o5 = c(0.78, 0.784, 1.834, 1.906, 1.915, 0.806, 1.863, 
    0.775, 0.817, 0.742, 0.783, 0.759, 0.787, 0.758, 0.783, 3, 
    2), al2o3 = c(5.812, 5.819, 3.938, 5.621, 3.928, 3.901, 5.621, 
    5.828, 4.038, 5.657, 3.993, 5.735, 4.002, 5.728, 4.042, 6, 
    5), dus = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("ano", "ne"), class = 
"factor")), .Names = c("vzorek", 
"iep", "skupina", "sio2", "p2o5", "al2o3", "dus"), class = "data.frame", 
row.names = c(NA, 
-17L))

and I try to do principal component analysis. Here is one without scaling

fit<-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2)
biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8)

you can see that data make 3 groups according to variables sio2 and dus 
which seems to be reasonable as lowest group has different value of dus = 
"ano" while highest group has low value of sio2.

But when I do the same with scale=T

fit<-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2, 
scale=T)
biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8)

I get completely different picture which is not possible to interpret in 
such an easy way.

So if anybody can advice me if I shall follow recommendation from help 
page (which says
The default is FALSE for consistency with S, but in general scaling is 
advisable.
or if I shall stay with scale = FALSE and with simply interpretable 
result?
 
Thank you.

Petr Pikal
petr.pikal at precheza.cz




More information about the R-help mailing list