[BioC] Re: [S] Error in clustering procedure

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Sep 8 11:40:50 CEST 2004


Please note my comment was not about the usefulness of clustering or even
of hierarchical clustering, but about the sub-optimality of 
*agglomerative* clustering on large sets.

If you think you need clustering with thousands of objects there are in my
experience always better ways to achieve the real objective than
agglomerative clustering.  Typically people are looking for a few large
clusters or outliers or many small clusters within already known larger
groupings. In the case of a heatmap, clustering is being used to produce a
1D MDS (a seriation) for which better methods are known.

BDR

On Wed, 8 Sep 2004, Ramon Diaz-Uriarte wrote:

> On Tuesday 07 September 2004 21:17, cstrato wrote:
> > Dear all
> >
> > First of all, I want to apologize to Prof. Ripley, since I forgot
> > to ask him first for permission to publish his comment.
> >
> > Personally, I agree that this would be useless, as Prof. Ripley
> > has already told me some years ago. However, almost everybody
> > still seems to do it and publish the corresponding results.
> > Companies such as Spotfire are proud that you can do hierarchical
> > clustering with more than 20,000 genes.
> > I have never seen a publication where it was done differently.
> 
> 
> Part of this could be the result of imitative behavior, beliefs that "unless I 
> put a neat heatmap I won't get it past reviewers", etc, but not evidence that 
> it is the best way to go. If several companies are making an issue out of it 
> in their brochures, maybe it is because customers ask for clustering.  As for 
> "publish the corresponding results" I am not sure what the "results" are, 
> since after clustering 7000 genes you can almost always make up a story after 
> the fact; but I would not call that a result. 
> 
> I think clustering (and biclustering) do have a place, but I guess they should 
> be used as a tool to answer some question (e.g., I think I understand what 
> question a t-test is helping to answer; I am not sure about most clustering 
> procedures), or as a guidance for something, not as some kind of magic tool 
> to "let the data speak for themselves" ( = a) get the microarray data; b) run 
> a clustering procedure; c) come up with a question that your cluster 
> "answered".)
> 
> R.
> 
> 
> >
> > I think that the bioconductor list would be the best forum to
> > discuss this issue, and provide solutions (besides the obvious
> > suggestion to filter non-varying genes).
> >
> > Best regards
> > Christian
> >
> > James W. MacDonald wrote:
> > > cstrato wrote:
> > >> Sorry, but I cannot resist:
> > >>
> > >> Any comments of the microarry community on the usefulness of
> > >> hierarchical clustering of 7000 genes?
> > >
> > > I think this would be almost completely useless. First off, clustering
> > > is not an inferential technique, so its use has been completely oversold
> > > IMO to the biological community. Secondly, clustering is usually done to
> > > produce a 'heat map' to put in a paper or flash on the screen during a
> > > presentation. How on earth would this be of any use? You couldn't even
> > > read any of the gene names!
> > >
> > > Of course you could use the heatmap to impress friends and colleagues
> > > with the fact that you rate a computer powerful enough to *do* a heatmap
> > > with a 7000 x 5 matrix ;-D
> > >
> > > Jim
> > >
> > >> Best regards
> > >> Christian
> > >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > >> C.h.r.i.s.t.i.a.n. .S.t.r.a.t.o.w.a
> > >> V.i.e.n.n.a.         .A.u.s.t.r.i.a
> > >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the Bioconductor mailing list