BagBoosting for Tumor Classification with Gene Expression Data
|Published:||In Bioinformatics, December 12th, 2004
Microarray experiments are expected to contribute significantly to progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools that can deal with a large number of highly correlated input variables, perform feature selection, and provide class probability estimates that serve as a quantification of the predictive uncertainty. A very promising solution is to combine the two ensemble schemes bagging and boosting to a novel algorithm called BagBoosting.
When bagging is used as a module in boosting, the resulting classifier consistently improves the predictive performance and the probability estimates of both bagging and boosting on real and simulated gene expression data. This quasi-guaranteed improvement can be obtained by simply making a bigger computing effort. The empirical advantage is also clearly present when comparing BagBoosting to several established class prediction tools for microarray data.
|Reference:||Bioinformatics (2004), Vol. 20, No. 18,
is available online from the Bioinformatics webpage: click
here. A slightly outdated preprint is available as PDF(334k)
For reprint requests and further information, please contact me via e-mail.
For the purpose of comparison, you can download the preprocessed microarray gene expression datasets exactly as I used them in my empirical study. They are provided as R data files and contain both the expression matrix and the response variable. Click here for the Leukemia data (2010k), Colon data (970k), Prostate data (4809k), Lymphoma data (1951k), SRBCT data (1137k) and Brain data (1837k). For information about the origin and the preprocessing of the datasets, please read my paper about Supervised Clustering of Genes.
|Related material:|| |
Our first paper about boosting for tumor classification with gene expression data is available here.
|Back / Home||Marcel Dettling, 20.04.2005|