Supervised Clustering of Genes
| Authors: | Marcel Dettling and Peter Buehlmann |
| Published: | In Genome Biology, November 25, 2002 |
| Background: | We focus on microarray data where experiments monitor
gene expression in different tissues and where each experiment is
equipped with an additional response variable such as a cancer
type. Although the number of measured genes is in the thousands, it is
assumed that only a few marker components of gene subsets determine the
type of a tissue. Here we present a new method for finding such groups
of genes by directly incorporating the response variables into the
grouping process, yielding a supervised clustering algorithm for genes.
|
| Results: | An empirical study on eight publicly available
microarray datasets shows that our algorithm identifies gene clusters
with excellent predictive potential, often superior to classification
with state-of-the-art methods based on single genes. Permutation tests
and bootstrapping provide evidence that the output is reasonably stable
and more than a noise artifact. |
| Conclusions: | In contrast to other methods such as hierarchical
clustering, our algorithm identifies several gene clusters whose
expression levels clearly distinguish the different tissue types. The
identification of such gene clusters is potentially useful for medical
diagnostics and may at the same time reveal insights into functional
genomics. |
| Software: | Is available as an R-Package called
supclust from CRAN. There
is also a Windows
binary version available. |
| Length: | 15 pages |
| Reference: | Genome Biology (2002), 3(12):
research0069.1-0069.15. |
| Online version: | Click
here |
| Print version: | PDF (176k) |
| Back / Home | Marcel Dettling, 15.01.2004 |