[BioC] Clustering and gene modules

Sun Jan 2 02:03:04 CET 2005

Thanks for the reply. 

Using a package to generate differentially expressed genes is
possible, but it makes for a large number of preliminary gene lists
(with 6 subtypes, 41). Dividing the lists into up and downexpressed
doubles the number of lists to go through. But I certainly agree that
this is likely to generate useful results.

However, I wonder if is it possible (non-statistician here) that
clusters of genes, insignificant by themselves individually, may
actually be significant in differentiating subtypes as a group? If so,
a strategy based on differential expression may leave out clusters of
genes that may actually be useful.

For my data, I was hoping ontological labelling of gene clustering
(only clusters significantly differentiating subtypes) would be a
viable strategy, since there seem to be some work in this direction
(goCluster and GeneXpress). I conceptualized the standard heatmap:
heatmaps depict gene clusters (with the usual clumps of red and
blue/green), some of these clumps, (or "modules") are clearly shared
across subtypes and some are unique to particular subtypes. Some
clusters of course are non-subtype specific (e.g. genes predicting
gender). The working strategy (which is certainly imperfect) is that
these clumps of genes mean something since they are co-expressed.

Thank you! I appreciate your advice on this.

Min-han

On Sat, 1 Jan 2005 19:14:42 -0500, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> Why not look for differentially expressed genes between groups using Limma
> or some other package?  Then, characterize the sets of differentially
> expressed genes using gene ontology using a package like GOstats and
> GOHyperG?  This sounds like a more "traditional" analysis than what you are
> proposing.  Is there a reason not to look for statistically differentially
> expressed genes?
> 
> Sean
> 
> ----- Original Message -----
> From: "Min-Han Tan" <minhan.science at gmail.com>
> To: <bioconductor at stat.math.ethz.ch>
> Sent: Saturday, January 01, 2005 6:16 PM
> Subject: [BioC] Clustering and gene modules
> 
> > New Year greetings to all.
> >
> > I have a problem which I am not sure how best to solve, and hope to
> > seek advice from the list.
> >
> > I have 200 oligonucleotide arrays of about 13000 transcripts,
> > belonging to approximately 6 different cancer subtypes. Essentially, I
> > am hoping to first identify "gene modules" of gene expression
> > corresponding to a specific cancer subtype, or groups of subtypes.
> > (e.g. present only in A and B cancer, but not in C, D, E or F).
> > Subsequently, I wish to label these modules by gene ontology. (e.g.
> > "T-cell response" module)
> >
> > I tried a non-R program (GenXpress) which has been used to publish
> > work in Nature Genetics, but I ran into quite a few freezes and
> > glitches with the online cancer data posted alongside the program (not
> > sure if it's a Windows issue on my side).
> >
> > I was thinking of first filtering the transcripts by variation and
> > minimum expression, performing hierarchical clustering for the final
> > gene set, choosing gene clusters by a minimum cluster size e.g. 20
> > transcripts, sifting through these clusters to find "modules" by
> > identifying subclusters differentiating between various permutations
> > of cancer A, B, C, D, E and F to a minimum significance value, and
> > then using the package gocluster to identify the relevant annotations
> > for each of these clusters.
> >
> > Any advice would be greatly appreciated. Thank you!
> >
> > Regards,
> > Min-Han Tan
> > Van Andel Institute, MI
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> 
>