Supervised Gene Clustering with Penalized Logistic Regression
Marcel Dettling and Peter Bühlmann
May 2003
Abstract
Microarray experiments generate large datasets with expression values for
thousands of genes, but not more than a few dozens of samples. A
challenging task with these data is to reveal groups of co-regulated
genes whose collective expression is strongly associated with an outcome
variable of interest. To find these groups, we suggest the use of
supervised clustering algorithms: these are procedures which use external
information about the response variables for clustering the genes. We
present Pelora, an algorithm based on penalized logistic
regression analysis, that combines gene selection, supervision, gene
clustering and sample classification in a single step. With an empirical
study on six different microarray datasets, we show that Pelora
identifies gene clusters whose expression centroids have excellent
predictive potential and yield results that are superior to
state-of-the-art classification methods based on single genes. Thus, our
clusters can be beneficial in medical diagnostics and prognostics, but
they can also be very useful for functional genomics by providing
insights into gene function and regulation.
Download:
Compressed Postscript (115 Kb)
PDF (258 Kb).
Go back to the
Research Reports
from
Seminar für Statistik.