[R] scale or not to scale that is the question - prcomp
Petr PIKAL
petr.pikal at precheza.cz
Wed Aug 19 14:31:23 CEST 2009
Dear all
here is my data called "rglp"
structure(list(vzorek = structure(1:17, .Label = c("179/1/1",
"179/2/1", "180/1", "181/1", "182/1", "183/1", "184/1", "185/1",
"186/1", "187/1", "188/1", "189/1", "190/1", "191/1", "192/1",
"R310", "R610L"), class = "factor"), iep = c(7.51, 7.79, 5.14,
6.35, 5.82, 7.13, 5.95, 7.27, 6.29, 7.5, 7.3, 7.27, 6.46, 6.95,
6.32, 6.32, 6.34), skupina = c(7.34, 7.34, 5.14, 6.23, 6.23,
7.34, 6.23, 7.34, 6.23, 7.34, 7.34, 7.34, 6.23, 7.34, 6.23, 6.23,
6.23), sio2 = c(0.023, 0.011, 0.88, 0.028, 0.031, 0.029, 0.863,
0.898, 0.95, 0.913, 0.933, 0.888, 0.922, 0.882, 0.923, 1, 1),
p2o5 = c(0.78, 0.784, 1.834, 1.906, 1.915, 0.806, 1.863,
0.775, 0.817, 0.742, 0.783, 0.759, 0.787, 0.758, 0.783, 3,
2), al2o3 = c(5.812, 5.819, 3.938, 5.621, 3.928, 3.901, 5.621,
5.828, 4.038, 5.657, 3.993, 5.735, 4.002, 5.728, 4.042, 6,
5), dus = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("ano", "ne"), class =
"factor")), .Names = c("vzorek",
"iep", "skupina", "sio2", "p2o5", "al2o3", "dus"), class = "data.frame",
row.names = c(NA,
-17L))
and I try to do principal component analysis. Here is one without scaling
fit<-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2)
biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8)
you can see that data make 3 groups according to variables sio2 and dus
which seems to be reasonable as lowest group has different value of dus =
"ano" while highest group has low value of sio2.
But when I do the same with scale=T
fit<-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2,
scale=T)
biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8)
I get completely different picture which is not possible to interpret in
such an easy way.
So if anybody can advice me if I shall follow recommendation from help
page (which says
The default is FALSE for consistency with S, but in general scaling is
advisable.
or if I shall stay with scale = FALSE and with simply interpretable
result?
Thank you.
Petr Pikal
petr.pikal at precheza.cz
More information about the R-help
mailing list