[Rd] cluster - clusplot.default (PR#1249)
maechler@stat.math.ethz.ch
maechler@stat.math.ethz.ch
Thu, 10 Jan 2002 09:25:21 +0100 (MET)
>>>>> "kjetil" == kjetil halvorsen <kjetilh@umsanet.edu.bo> writes:
kjetil> The following code in clusplot.default (package cluster) is in error:
kjetil> x1 <- cmdscale(x, k = 2, eig = TRUE)
kjetil> var.dec <- sum(x1$eig)/sum(diag(x1$x))
kjetil> if (var.dec < 0) var.dec <- 0
kjetil> if (var.dec > 1) var.dec <- 1
kjetil> x1 <- x1$points
kjetil> x1 has components with names "points" and "eig", not
kjetil> "x", so sum(diag(x1$x)) returns 0, the division
kjetil> gives Inf which is later replaced by 1. So in the
kjetil> plot it is reported (always) that "These two
kjetil> components explain 100% of the variability".
Thank you Kjetil.
Yes, there's definitely a problem there.
However the solution is not as easy: Doing the replacement you
suggest is not enough, since var.dec still is not scaled to [0,1].
Before the lines you cite above, there is
##x1 <- cmd(x, k = 2, eig = T, add = T)
##if(x1$ac < 0)
## x1 <- cmd(x, k = 2, eig = T)
which was Rousseeuw et al's original code -- instead of the
x1 <- cmdscale(...) line above.
And cmd() was an internal function calling directly into
undocumented S-plus internal Fortran code...
The original porter of the cluster package had replaced the
cmd() by cmdscale() which seemed but was not ok.
I'll have a look.
Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._