[BioC] Re: [S] Error in clustering procedure

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Sep 13 11:03:14 CEST 2004


On Mon, 13 Sep 2004, michael watson (IAH-C) wrote:

> I guess I'm coming to this late, 

You are, yet have overlooked important points in later parts of the 
thread.

> but I'm pretty sure all biologists use
> cluster analysis for is for finding out which genes are behaving
> similarly to one another in a large data set.  

Really?  Have you never seen a heatmap with clustering on the margins?
There clustering is being used for seriation.

> Then if, for example, all
> genes from a certain pathway are showing a similar expression pattern,
> we have a hypothesis which can be tested further.
> 
> If cluster analysis has indeed been "over-sold", please suggest a better
> algorithm for summarising groups of genes that are behaving similarly
> across a group of experiments or time-points :-)

My point was about methods/algorithms for cluster analysis, as I have
already replied in this thread.

But MDS-like methods (note, not algorithms) are better for your stated 
purpose.

> 
> M
> 
> -----Original Message-----
> From: Ramon Diaz-Uriarte [mailto:rdiaz at cnio.es] 
> Sent: 08 September 2004 09:33
> To: bioconductor at stat.math.ethz.ch
> Cc: Prof Brian Ripley; cstrato; James W. MacDonald
> Subject: Re: [BioC] Re: [S] Error in clustering procedure
> 
> 
> On Tuesday 07 September 2004 21:17, cstrato wrote:
> > Dear all
> >
> > First of all, I want to apologize to Prof. Ripley, since I forgot to 
> > ask him first for permission to publish his comment.
> >
> > Personally, I agree that this would be useless, as Prof. Ripley has 
> > already told me some years ago. However, almost everybody still seems 
> > to do it and publish the corresponding results. Companies such as 
> > Spotfire are proud that you can do hierarchical clustering with more 
> > than 20,000 genes. I have never seen a publication where it was done 
> > differently.
> 
> 
> Part of this could be the result of imitative behavior, beliefs that "unless I 
> put a neat heatmap I won't get it past reviewers", etc, but not evidence that 
> it is the best way to go. If several companies are making an issue out of it 
> in their brochures, maybe it is because customers ask for clustering.  As for 
> "publish the corresponding results" I am not sure what the "results" are, 
> since after clustering 7000 genes you can almost always make up a story after 
> the fact; but I would not call that a result. 
> 
> I think clustering (and biclustering) do have a place, but I guess they should 
> be used as a tool to answer some question (e.g., I think I understand what 
> question a t-test is helping to answer; I am not sure about most clustering 
> procedures), or as a guidance for something, not as some kind of magic tool 
> to "let the data speak for themselves" ( = a) get the microarray data; b) run 
> a clustering procedure; c) come up with a question that your cluster 
> "answered".)
> 
> R.
> 
> 
> >
> > I think that the bioconductor list would be the best forum to discuss 
> > this issue, and provide solutions (besides the obvious suggestion to 
> > filter non-varying genes).
> >
> > Best regards
> > Christian
> >
> > James W. MacDonald wrote:
> > > cstrato wrote:
> > >> Sorry, but I cannot resist:
> > >>
> > >> Any comments of the microarry community on the usefulness of 
> > >> hierarchical clustering of 7000 genes?
> > >
> > > I think this would be almost completely useless. First off, 
> > > clustering is not an inferential technique, so its use has been 
> > > completely oversold IMO to the biological community. Secondly, 
> > > clustering is usually done to produce a 'heat map' to put in a paper 
> > > or flash on the screen during a presentation. How on earth would 
> > > this be of any use? You couldn't even read any of the gene names!
> > >
> > > Of course you could use the heatmap to impress friends and 
> > > colleagues with the fact that you rate a computer powerful enough to 
> > > *do* a heatmap with a 7000 x 5 matrix ;-D
> > >
> > > Jim
> > >
> > >> Best regards
> > >> Christian
> > >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > >> C.h.r.i.s.t.i.a.n. .S.t.r.a.t.o.w.a
> > >> V.i.e.n.n.a.         .A.u.s.t.r.i.a
> > >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch 
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the Bioconductor mailing list