[R] Re: Clustering in R

wmak@brandeis.edu wmak at brandeis.edu
Thu Jun 17 22:32:19 CEST 2004

Thank you for all the responses.  I've downloaded TIGR MeV, it seems to do
everything I need it to do.  The only problem is that none of the algorithms
seem to work (K-means, hierarchical) giving errors that just say -2.  I think
probably the reason for this is that I'm running linux (since there aren't any
available windows machines), and MeV was debugged in windows.  It seems that
the lab is getting a new windows machine soon so at that time I'll be able to
try it out.

In the meantime, I'm going to see if I can get workable data by removing
unexpressed genes and using hclust with cutree to get into each of the
branches.  From the replies I've gotten, this should probably work, I just
need to sit down and try it. If I get stuck on either of these, I know where
to ask for more help.  At least now I have viable directions.  Thanks again
for everyone who responded, it was enormously helpful. 


Quoting Martin Maechler <maechler at stat.math.ethz.ch>:

> Thanks a lot, Michael!
> I cc to R-help, where this question really belongs {as the
> 'Subject' suggests itself...} -- please drop 'bioconductor' from
> CC'ing further replies.
> >>>>> "michael" == michael watson (IAH-C) <michael.watson at bbsrc.ac.uk>
> >>>>>     on Thu, 17 Jun 2004 09:16:59 +0100 writes:
>     michael> OK, admittedly it is not incredibly simple, but it
>     michael> is not *that* difficult.
>     michael> If you are familiar with R, it should take you an
>     michael> hour or two; if unfamiliar, perhaps a day or two.
>     michael> The commands you want (and need to read the help on) are:
>     michael> hclust
>     michael> plclust
>     michael> cutree
> and I would add  identify.hclust()   {and rect.hclust()}
> a very neat but not known / used enough function
> a link to which I have just added to the help(hclust) page.
> Look at its examples {not with example() since they are
> "dontrun"} correcting the extraneous "." in the last (and
> coolest!) example!
>     michael> dendrogram
>     michael> as.dendrogram
>     michael> heatmap
> where you use "dendrogram"s produced from "hclust" objects via
> as.dendrogram(<hc-obj>) or also "twins" objects produced from
> package cluster's agnes() or diana() via  
>  as.dendrogram(as.hclust( <twins-obj> ) )
> help(dendrogram)  also mentions  
> "[[" (and shows examples) and cut() for cutting dendrograms and shows
> how you can depict dendrograms into its parts.
>     michael> With intelligent use of hclust -> cutree -> subsetting ->
> hclust
>     michael> (in that order) you will be able to drill down
>     michael> into your dendrogram and create sub-trees - until
>     michael> you get to the level where you can see your gene
>     michael> names.
> or also
>    hclust -> as.dendrogram -> cut -> ..
> 			   -> [[  ->
> Note that there also is  reorder.dendrogram() for reordering
> dendrogram nodes ``sensibly'' --- something that heatmap() does,
> but you can play with quite a bit.
> Further, note Catherine Hurley's  "gclus" package which
> orders/reorders 'hclust' objects directly, but with a more
> interesting algorithm. 
> Note that I'd strongly recommend to use R 1.9.1 beta for these,
> since I know which bugs in the dendrogram code I have fixed
> since R 1.9.0...
>     michael> An important message to take home here is that if
>     michael> you have 14000 genes and therefore 14000 labels,
>     michael> it's going to be difficult to display your tree in
>     michael> ANY software, including the expensive commercial products.
> not showing the labels and using identify.hclust() and the
> command line to extract the indices of observations in
> clusters (and subclusters) and visualize them in other, non-dendrogram
> plots,
> might well be feasible.
>     michael> Let me know how you get on
>     michael> Thanks
>     michael> Mick
>     michael> -----Original Message-----
>     michael> From: wmak at brandeisedu [mailto:wmak at brandeis.edu] 
>     michael> Sent: 16 June 2004 21:26
>     michael> To: bioconductor at stat.math.ethz.ch
>     michael> Subject: [BioC] Clustering in R
>     >> Dear list members,
>     >> I'm an undergrad and I work in a lab at Brandeis.
>     >> I am trying to cluster around 14,000 genes across 6
>     >> microarray experiments.  Two of these experiments
>     >> are replicates.  I have decided to use R since it
>     >> seems to be the most complete and flexible software
>     >> package for normalization and clustering of
>     >> microarray data.
>     >> The problem is that I am new to clustering and to
>     >> R.  Just to mention of a few of the problems I'm
>     >> having: the dendrogram that is drawn by R from the
>     >> agnes object is far too dense to see any of the
>     >> gene names; kmeans won't work, returning an error
>     >> saying that my data has NAs in it (there weren't
>     >> any missing values in the original table though);
>     >> I'd like to be able to see a heatmap or a
>     >> cumulative plot of expression profiles for genes
>     >> that are clustered together or are on the same
>     >> branch of the dendrogram.
>     >> I know that these questions are probably very
>     >> simple, but I can't seem to find the answer to them
>     >> online or in the documentation.  If anyone can
>     >> answer these questions or direct me toward
>     >> resources that deal with clustering in R or
>     >> BioConductor, a basic tutorial that takes a
>     >> practical approach to it, I would really appreciate
>     >> it.  Any other reading material that isn't too
>     >> heavy on statistics that deals with clustering for
>     >> that matter, would be very helpful.
>     >> Thank you in advance,
>     >> Wayne Mak

More information about the R-help mailing list