[Rd] hclust: median, centroid (PR#4195)
kleiweg at let.rug.nl
kleiweg at let.rug.nl
Tue Sep 16 23:23:17 MEST 2003
There seems to be a bug in hclust (package mva) for clustering
methods 'median' and 'centroid'.
I have written a clustering program in C and discovered that the
results for 'median' differ from those of hclust in R. I used a
third program, written by someone else in Pascal, and that
program agrees with the output of my program.
I found yet another clustering program that seems to be built on
the same fortran code as was used for hclust. The source of this
code mentions a bug in the original code that effects both
methods 'median' and 'centroid'. This program has a fix for this
bug, but I can find no similar fix in the code of R's hclust.
You can find the program with the fix at:
http://www2.biology.ualberta.ca/jbrzusto/ftp/trees/source.zip
The relevant file is: qclust.c
The bug is mentioned at line 670 of that code.
The fix for the bug starts at line 908.
Unfortunatly, I do not know Fortran programming, so I can not
offer a tested solution for hclust. I hope I have located the
problem accurately enough for others to deal with it further.
You can find a data set to test this bug at:
http://www.let.rug.nl/~kleiweg/R/data
If you source this file, and then run:
sort(hclust(d, method="median")$height)
... you will see a list with the last value:
0.08449670
The correct value should be:
0.081786
--please do not edit the information below--
Version:
platform = i686-pc-linux-gnu
arch = i686
os = linux-gnu
system = i686, linux-gnu
status =
major = 1
minor = 7.1
year = 2003
month = 06
day = 16
language = R
Search Path:
.GlobalEnv, package:methods, package:ctest, package:mva, package:modreg, package:nls, package:ts, Autoloads, package:base
More information about the R-devel
mailing list