Boosting for high-dimensional linear models

Peter Bühlmann

February 2004

Abstract

We prove that boosting with the squared error loss, L₂Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the l₁-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the l₁-norm. L₂Boosting is computationally attractive. We propose an AIC-based estimate for tuning, namely choosing the number of boosting iterations. This makes L₂Boosting computationally even more attractive since boosting is not required to be run multiple times for cross-validation as commonly used in practice. We demonstrate L₂Boosting for simulated data, where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.

Download:

Compressed Postscript (121 Kb)
PDF (308 Kb).

Go back to the Research Reports from Seminar für Statistik.