How to Use Boosting for Tumor Classification with Gene Expression
Data
Marcel Dettling and Peter Bühlmann
March 2002
Abstract
Motivation: Microarray experiments generate large datasets with
expression values for thousands of genes but not more than a few dozens of
samples. Accurate supervised classification of tissue samples in such
high-dimensional problems is difficult but often crucial for successful
diagnosis and treatment. A promising way to meet this challenge is by
using boosting in conjunction with decision trees.
Results: We demonstrate that the generic boosting algorithm needs
some modifications to become an accurate classifier in the context of gene
expression data. In particular, we present a feature preselection method, a
more robust boosting procedure and a new approach for multi-categorical
problems. This allows for slight to drastic increase in performance and
yields competitive results on several publicly available datasets.
Availability: Software for the modified boosting algorithms as well
as for decision trees is available for free in
R
Contact: Marcel Dettling
Download:
Compressed Postscript (156 Kb)
PDF (183 Kb)
Go back to the
Research Reports
from
Seminar für Statistik.