BagBoosting for Tumor Classification with Gene Expression Data

Marcel Dettling

March, 2004

Abstract

Motivation: Microarray experiments are expected to contribute significantly to progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools that can deal with a large number of highly correlated input variables, perform feature selection, and provide class probability estimates that serve as a quantification of the predictive uncertainty. A very promising solution is to combine the two ensemble schemes bagging and boosting to a novel algorithm called BagBoosting.

Results: When bagging is used as a module in boosting, the resulting classifier consistently improves the predictive performance and the probability estimates of both bagging and boosting on real and simulated gene expression data. This quasi-guaranteed improvement can be obtained by simply making a bigger computing effort. The empirical advantage is also clearly present when comparing BagBoosting to several established class prediction tools for microarray data.

Availability: Software for the modified boosting algorithms, for all other classifiers described in this paper, as well as for benchmark studies and simulation of microarray data are available for free as an $R$ package

Download:

Compressed Postscript (122 Kb)
PDF (346 Kb).


Go back to the Research Reports from Seminar für Statistik.