[BioC] Clustering and gene modules
Sean Davis
sdavis2 at mail.nih.gov
Sun Jan 2 03:30:36 CET 2005
----- Original Message -----
From: "Min-Han Tan" <minhan.science at gmail.com>
To: <bioconductor at stat.math.ethz.ch>; <sdavis2 at mail.nih.gov>
Sent: Saturday, January 01, 2005 8:03 PM
Subject: Re: [BioC] Clustering and gene modules
> Thanks for the reply.
>
> Using a package to generate differentially expressed genes is
> possible, but it makes for a large number of preliminary gene lists
> (with 6 subtypes, 41). Dividing the lists into up and downexpressed
> doubles the number of lists to go through. But I certainly agree that
> this is likely to generate useful results.
You might want to think about using F-statistics (or one-way anova). This
allows one to look for differential expression in a general sense across
groups. You can then use something like Limma's decideTests to determine
which genes belong to which tumors. If you pick genes based on the highest
f-stats, you will likely end up with your envisioned patchwork of genes
upregulated in one or several groups. You can then look at each of those
clusters.
> However, I wonder if is it possible (non-statistician here) that
> clusters of genes, insignificant by themselves individually, may
> actually be significant in differentiating subtypes as a group? If so,
> a strategy based on differential expression may leave out clusters of
> genes that may actually be useful.
There is no question that this can be the case. Solutions that deals
naturally with this issue are the many different methods for doing
classification. Some classification techniques will allow you to determine
the "weight" of the genes that contribute to the classification.
Classification tries to determine a gene or group of genes that best
distinguish classes from each other. Note that this is NOT the same set of
genes that you find when looking for differential expression (although there
will often be a good deal of overlap).
> For my data, I was hoping ontological labelling of gene clustering
> (only clusters significantly differentiating subtypes) would be a
> viable strategy, since there seem to be some work in this direction
> (goCluster and GeneXpress). I conceptualized the standard heatmap:
> heatmaps depict gene clusters (with the usual clumps of red and
> blue/green), some of these clumps, (or "modules") are clearly shared
> across subtypes and some are unique to particular subtypes. Some
> clusters of course are non-subtype specific (e.g. genes predicting
> gender). The working strategy (which is certainly imperfect) is that
> these clumps of genes mean something since they are co-expressed.
I haven't used goCluster, so I'm not sure where it fits above. You are
probably quite right and my notes above are meant to point out two
"standard" techniques for determining genes that characterize sample
classes. Let's see what other input you get....
Sean
More information about the Bioconductor
mailing list