[Rd] cluster - clusplot.default (PR#1249)

maechler@stat.math.ethz.ch maechler@stat.math.ethz.ch
Thu, 10 Jan 2002 09:25:21 +0100 (MET)

>>>>> "kjetil" == kjetil halvorsen <kjetilh@umsanet.edu.bo> writes:

    kjetil> The following code in clusplot.default (package cluster) is in error:

    kjetil>    x1 <- cmdscale(x, k = 2, eig = TRUE)
    kjetil>    var.dec <- sum(x1$eig)/sum(diag(x1$x))
    kjetil>    if (var.dec < 0)  var.dec <- 0
    kjetil>    if (var.dec > 1)  var.dec <- 1
    kjetil>    x1 <- x1$points

    kjetil> x1 has components with names "points" and "eig", not
    kjetil> "x", so sum(diag(x1$x)) returns 0, the division
    kjetil> gives Inf which is later replaced by 1.  So in the
    kjetil> plot it is reported (always) that "These two
    kjetil> components explain 100% of the variability".

Thank you Kjetil.
Yes, there's definitely a problem there.
However the solution is not as easy: Doing the replacement you
suggest is not enough, since var.dec still is not scaled to [0,1].

Before the lines you cite above, there is

        ##x1 <- cmd(x, k = 2, eig = T, add = T)
        ##if(x1$ac < 0)
        ##	x1 <- cmd(x, k = 2, eig = T)

which was Rousseeuw et al's original code -- instead of the
  x1 <- cmdscale(...) line above.
And cmd() was an internal function calling directly into
undocumented S-plus internal Fortran code...
The original porter of the cluster package had replaced the
cmd() by cmdscale()  which seemed but was not ok.

I'll have a look.

Martin Maechler <maechler@stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><

r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch