How to Use Boosting for Tumor Classification with Gene Expression Data

Marcel Dettling and Peter Bühlmann

March 2002

Abstract

Motivation: Microarray experiments generate large datasets with expression values for thousands of genes but not more than a few dozens of samples. Accurate supervised classification of tissue samples in such high-dimensional problems is difficult but often crucial for successful diagnosis and treatment. A promising way to meet this challenge is by using boosting in conjunction with decision trees.

Results: We demonstrate that the generic boosting algorithm needs some modifications to become an accurate classifier in the context of gene expression data. In particular, we present a feature preselection method, a more robust boosting procedure and a new approach for multi-categorical problems. This allows for slight to drastic increase in performance and yields competitive results on several publicly available datasets.

Availability: Software for the modified boosting algorithms as well as for decision trees is available for free in R
Contact: Marcel Dettling

Download:

Compressed Postscript (156 Kb)
PDF (183 Kb)

Go back to the Research Reports from Seminar für Statistik.