[BioC] Clustering and gene modules

Sun Jan 2 03:30:36 CET 2005

----- Original Message ----- 
From: "Min-Han Tan" <minhan.science at gmail.com>
To: <bioconductor at stat.math.ethz.ch>; <sdavis2 at mail.nih.gov>
Sent: Saturday, January 01, 2005 8:03 PM
Subject: Re: [BioC] Clustering and gene modules

> Thanks for the reply.
>
> Using a package to generate differentially expressed genes is
> possible, but it makes for a large number of preliminary gene lists
> (with 6 subtypes, 41). Dividing the lists into up and downexpressed
> doubles the number of lists to go through. But I certainly agree that
> this is likely to generate useful results.

You might want to think about using F-statistics (or one-way anova).  This 
allows one to look for differential expression in a general sense across 
groups.  You can then use something like Limma's decideTests to determine 
which genes belong to which tumors.  If you pick genes based on the highest 
f-stats, you will likely end up with your envisioned patchwork of genes 
upregulated in one or several groups.  You can then look at each of those 
clusters.

> However, I wonder if is it possible (non-statistician here) that
> clusters of genes, insignificant by themselves individually, may
> actually be significant in differentiating subtypes as a group? If so,
> a strategy based on differential expression may leave out clusters of
> genes that may actually be useful.

There is no question that this can be the case.  Solutions that deals 
naturally with this issue are the many different methods for doing 
classification.  Some classification techniques will allow you to determine 
the "weight" of the genes that contribute to the classification. 
Classification tries to determine a gene or group of genes that best 
distinguish classes from each other.  Note that this is NOT the same set of 
genes that you find when looking for differential expression (although there 
will often be a good deal of overlap).

> For my data, I was hoping ontological labelling of gene clustering
> (only clusters significantly differentiating subtypes) would be a
> viable strategy, since there seem to be some work in this direction
> (goCluster and GeneXpress). I conceptualized the standard heatmap:
> heatmaps depict gene clusters (with the usual clumps of red and
> blue/green), some of these clumps, (or "modules") are clearly shared
> across subtypes and some are unique to particular subtypes. Some
> clusters of course are non-subtype specific (e.g. genes predicting
> gender). The working strategy (which is certainly imperfect) is that
> these clumps of genes mean something since they are co-expressed.

I haven't used goCluster, so I'm not sure where it fits above.  You are 
probably quite right and my notes above are meant to point out two 
"standard" techniques for determining genes that characterize sample 
classes.  Let's see what other input you get....

Sean