Boosting for Tumor Classification with Gene Expression Data

 

Authors: Marcel Dettling and Peter Buehlmann

Published: In Bioinformatics, June 12, 2003

Motivation:
Microarray experiments generate large datasets with expression values for thousands of genes but not more than a few dozens of samples. Accurate supervised classification of tissue samples in such high-dimensional problems is difficult but often crucial for successful diagnosis and treatment. A promising way to meet this challenge is by using boosting in conjunction with decision trees.

Results:
We demonstrate that the generic boosting algorithm needs some modifications to become an accurate classifier in the context of gene expression data. In particular, we present a feature preselection method, a more robust boosting procedure and a new approach for multi-categorical problems. This allows for slight to drastic increase in performance and yields competitive results on several publicly available datasets.

Software:
The most recommended alternative is to use the R-package boost from CRAN, of which there is also a Windows binary version available. However, you can also work with the much older original, non-CRAN package LogitBoost contains an implementation of our modified boosting algorithm. It is available as Linux/Unix (.tar.gz) version, as well as a precompiled Windows (.zip) version. Its manual (ps/pdf) contains a function index. The LogitBoost package requires the R-package rpart, which contains software for decision trees. As a Windows user, you can alternatively compile the package from source. Read here (ps/pdf) how this works.

Length: 9 pages

Reference: Bioinformatics (2003), Vol. 19, No. 9, p. 1061-1069

Download: PDF(105k)


Back / Home Marcel Dettling, 20.10.2003