Boosting for Tumor Classification with Gene Expression Data
| Authors: | Marcel Dettling and Peter Buehlmann |
| Published: | In Bioinformatics, June 12, 2003 |
| Motivation: | Microarray experiments generate large datasets with
expression values for thousands of genes but not more than a few dozens
of samples. Accurate supervised classification of tissue samples in
such high-dimensional problems is difficult but often crucial for
successful diagnosis and treatment. A promising way to meet this
challenge is by using boosting in conjunction with decision
trees. |
| Results: | We demonstrate that the generic boosting algorithm
needs some modifications to become an accurate classifier in the
context of gene expression data. In particular, we present a feature
preselection method, a more robust boosting procedure and a new
approach for multi-categorical problems. This allows for slight to
drastic increase in performance and yields competitive results on
several publicly available datasets. |
| Software: | The most recommended alternative is to use the R-package
boost from CRAN, of which there is also a
Windows binary version available. However, you can also work
with the much older original, non-CRAN package LogitBoost contains
an implementation of our modified boosting algorithm. It is
available as Linux/Unix
(.tar.gz) version, as well as a precompiled Windows (.zip) version. Its manual (ps/pdf) contains a function
index. The LogitBoost package requires the R-package
rpart, which contains software for decision trees. As a
Windows user, you can alternatively compile the package from
source. Read here (ps/pdf) how this works. |
| Length: | 9 pages |
| Reference: | Bioinformatics (2003), Vol. 19, No. 9, p. 1061-1069 |
| Download: | PDF(105k) |
| Back / Home | Marcel Dettling, 20.10.2003 |