Boosting for high-dimensional linear models
Peter Bühlmann
February 2004
Abstract
We prove that boosting with the squared error loss,
L2Boosting, is
consistent for very high-dimensional linear models, where the number of
predictor variables is allowed to grow essentially as
fast as O(exp(sample size)), assuming that the true underlying
regression function is sparse in terms of the l1-norm of the
regression coefficients. In the language of signal processing, this means
consistency for de-noising using a strongly overcomplete dictionary if the
underlying signal is sparse in terms of the l1-norm.
L2Boosting is computationally attractive. We propose an
AIC-based
estimate for tuning, namely choosing the number of boosting
iterations. This makes L2Boosting computationally even
more attractive
since boosting is not required to be run multiple times for
cross-validation as commonly used in practice.
We demonstrate L2Boosting for simulated data, where the
predictor
dimension is large in comparison to sample size, and for a
difficult tumor-classification problem with gene expression microarray
data.
Download:
Compressed Postscript (121 Kb)
PDF (308 Kb).
Go back to the
Research Reports
from
Seminar für Statistik.