[BioC] Re: [S] Error in clustering procedure
michael watson (IAH-C)
michael.watson at bbsrc.ac.uk
Mon Sep 13 11:08:34 CEST 2004
Great, that's what I was looking for!
Personally, I use cluster analysis sparingly and as a very "exploratory"
tool.
I think, though I may be wrong, that most biologists realise its
limitations.
I also think that it is not "completely useless", and perhaps if people
do think a method is useless, they should suggest an alternative, which
you have. Thank you!
M
-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
Sent: 13 September 2004 10:03
To: michael watson (IAH-C)
Cc: Ramon Diaz-Uriarte; bioconductor at stat.math.ethz.ch; cstrato; James
W. MacDonald
Subject: RE: [BioC] Re: [S] Error in clustering procedure
On Mon, 13 Sep 2004, michael watson (IAH-C) wrote:
> I guess I'm coming to this late,
You are, yet have overlooked important points in later parts of the
thread.
> but I'm pretty sure all biologists use
> cluster analysis for is for finding out which genes are behaving
> similarly to one another in a large data set.
Really? Have you never seen a heatmap with clustering on the margins?
There clustering is being used for seriation.
> Then if, for example, all
> genes from a certain pathway are showing a similar expression pattern,
> we have a hypothesis which can be tested further.
>
> If cluster analysis has indeed been "over-sold", please suggest a
> better algorithm for summarising groups of genes that are behaving
> similarly across a group of experiments or time-points :-)
My point was about methods/algorithms for cluster analysis, as I have
already replied in this thread.
But MDS-like methods (note, not algorithms) are better for your stated
purpose.
>
> M
>
> -----Original Message-----
> From: Ramon Diaz-Uriarte [mailto:rdiaz at cnio.es]
> Sent: 08 September 2004 09:33
> To: bioconductor at stat.math.ethz.ch
> Cc: Prof Brian Ripley; cstrato; James W. MacDonald
> Subject: Re: [BioC] Re: [S] Error in clustering procedure
>
>
> On Tuesday 07 September 2004 21:17, cstrato wrote:
> > Dear all
> >
> > First of all, I want to apologize to Prof. Ripley, since I forgot to
> > ask him first for permission to publish his comment.
> >
> > Personally, I agree that this would be useless, as Prof. Ripley has
> > already told me some years ago. However, almost everybody still
seems
> > to do it and publish the corresponding results. Companies such as
> > Spotfire are proud that you can do hierarchical clustering with more
> > than 20,000 genes. I have never seen a publication where it was done
> > differently.
>
>
> Part of this could be the result of imitative behavior, beliefs that
> "unless I
> put a neat heatmap I won't get it past reviewers", etc, but not
evidence that
> it is the best way to go. If several companies are making an issue out
of it
> in their brochures, maybe it is because customers ask for clustering.
As for
> "publish the corresponding results" I am not sure what the "results"
are,
> since after clustering 7000 genes you can almost always make up a
story after
> the fact; but I would not call that a result.
>
> I think clustering (and biclustering) do have a place, but I guess
> they should
> be used as a tool to answer some question (e.g., I think I understand
what
> question a t-test is helping to answer; I am not sure about most
clustering
> procedures), or as a guidance for something, not as some kind of magic
tool
> to "let the data speak for themselves" ( = a) get the microarray data;
b) run
> a clustering procedure; c) come up with a question that your cluster
> "answered".)
>
> R.
>
>
> >
> > I think that the bioconductor list would be the best forum to
> > discuss
> > this issue, and provide solutions (besides the obvious suggestion to
> > filter non-varying genes).
> >
> > Best regards
> > Christian
> >
> > James W. MacDonald wrote:
> > > cstrato wrote:
> > >> Sorry, but I cannot resist:
> > >>
> > >> Any comments of the microarry community on the usefulness of
> > >> hierarchical clustering of 7000 genes?
> > >
> > > I think this would be almost completely useless. First off,
> > > clustering is not an inferential technique, so its use has been
> > > completely oversold IMO to the biological community. Secondly,
> > > clustering is usually done to produce a 'heat map' to put in a
paper
> > > or flash on the screen during a presentation. How on earth would
> > > this be of any use? You couldn't even read any of the gene names!
> > >
> > > Of course you could use the heatmap to impress friends and
> > > colleagues with the fact that you rate a computer powerful enough
to
> > > *do* a heatmap with a 7000 x 5 matrix ;-D
> > >
> > > Jim
> > >
> > >> Best regards
> > >> Christian
> > >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > >> C.h.r.i.s.t.i.a.n. .S.t.r.a.t.o.w.a
> > >> V.i.e.n.n.a. .A.u.s.t.r.i.a
> > >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the Bioconductor
mailing list