[BioC] hclust and (Eisen+ de Hoon) cluster3 program
Antoine Lucas
Antoine.Lucas at cgm.cnrs-gif.fr
Mon Dec 6 14:20:47 CET 2004
Dear all,
I saw (be maybe on a older version of Eisen software) a
problem of precision, I sent him this remark (Apr 2002):
--- Old Message ---
I used simple data (see below) to understand the
hierarchical clustering, and I did find the same
results with Maple (not very convenient !) but
with a very different precision.
Example (Distance: correlation centered, average link):
NODE1X GENE15X GENE10X 0.9996337890625
NODE2X GENE20X GENE16X 0.99957275390625
NODE3X GENE14X GENE11X 0.99835205078125
Maple: v
Node1: .99959179339780276201
Node2: .99956936766825333998
Node3: .99833748845958267738
I thought that Cluster use Double precision, but
it should have something like 15 good digits.
Fortunately, data were very short, and with the
same order of magnitude, but a computer scientist
told me that floating point precision is far more
less if operands (in addition, substraction...)
differ greatly in size.
--------------
Data:
UNIQID NAME GWEIGHT GORDER "V1" "V2" "V3"
EWEIGHT 1 1 1
"A1" 1 1 2 16 18
"A2" 1 2 12 9 7
"A3" 1 3 9 10 4
"A4" 1 4 5 2 12
"A5" 1 5 12 14 7
"A6" 1 6 9 16 10
"A7" 1 7 8 10 10
"A8" 1 8 10 6 6
"A9" 1 9 14 1 28
"A10" 1 10 9 10 23
"A11" 1 11 9 16 27
"A12" 1 12 17 12 37
"A13" 1 13 15 5 23
"A14" 1 14 7 14 29
"A15" 1 15 11 8 29
"A16" 1 16 4 16 37
"A17" 1 17 32 25 34
"A18" 1 18 28 35 30
"A19" 1 19 30 28 23
"A20" 1 20 32 22 28
"A21" 1 21 25 22 26
"A22" 1 22 27 33 26
"A23" 1 23 28 33 31
"A24" 1 24 36 28 31
---
On Mon, 06 Dec 2004 13:09:46 +0100
Benjamin Haibe-Kains <bhaibeka at ulb.ac.be> wrote:
> Hi Michael,
>
> I think that the differences are too important to be due to different
> implementation decisions. Actually my problem is that I have a group of
> 1 object and the rest in the other group when I use the 'centroid'
> hclust (I use cutree to have the main two groups) and it's not the case
> with other softwares. It looks like a bug in the Fortran routine but I
> can not access to it.
>
> Have you reported this "bug" before ? Can I write my 'centroid' method
> easily ?
>
> cheers,
>
> benjamin
>
> michael watson (IAH-C) wrote:
>
> >Benjamin
> >
> >You will likely get different results from all clustering software, even
> >when using the same parameters. This is because many arbitrary
> >decisions have to be made during a hierarchical cluster analysis and
> >different programmers will make those decisions in different ways.
> >
> >Mick
> >
> >-----Original Message-----
> >From: bioconductor-bounces at stat.math.ethz.ch
> >[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Benjamin
> >Haibe-Kains
> >Sent: 06 December 2004 11:05
> >To: Bioconductor Mailing List
> >Subject: [BioC] hclust and (Eisen+ de Hoon) cluster3 program
> >
> >
> >Hi all,
> >
> >I have a problem with the R function 'hclust'. I have noticed
> >differences in clustering when I use the 'centroid' cluster method with
> >'hclust' and the cluster3 program (see M. Eisen and M. de Hoon).
> >
> >Have you noticed some differences too ?
> >
> >I use
> >
> >hclust from library 'stats' (Built: R 2.0.1; i386-pc-linux-gnu;
> >2004-11-15 15:56:06; unix)
> >cluster 3.0 using C Clustering Library version 1.25
> >
> >Thanks a lot
> >
> >
> >
>
--
Antoine Lucas
Centre de génétique Moléculaire, CNRS
91198 Gif sur Yvette Cedex
Tel: (33)1 69 82 38 89
Fax: (33)1 69 82 38 77
More information about the Bioconductor
mailing list