Consistent neighbourhood selection for sparse high-dimensional graphs with the Lasso

Nicolai Meinshausen and Peter Bühlmann

Abstract

The pattern of zero entries in the covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. The structure is most conveniently summarized in a graphical model (Lauritzen 1996).

Covariance selection (Dempster 1972) aims at estimating those structural zeros from data. The complexity of standard covariance selection methods is, however, very high, making inference of all but low-dimensional graphs infeasible. Moreover, existence of the MLE estimate cannot be guaranteed and the performance of the method is poor if the number of observations is small compared to the number of variables.

We propose neighbourhood selection with the Lasso as a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighbourhood selection estimates the conditional independence restrictions separately for each node in the graph.

We show that the proposed neighbourhood selection scheme is consistent for sparse high-dimensional graphs. The consistency hinges on the choice of the penalty parameter. Maybe surprisingly, the oracle value for optimal prediction does not lead to a consistent neighbourhood estimate. It is proposed instead to control the probability of falsely joining some distinct connectivity components of the graph. This leads to consistent estimation for sparse graphs (with exponential rates), even when the number of variables grows like any power of the number of observations.

Download:

Compressed Postscript (395 Kb)
PDF (426 Kb).


Go back to the Research Reports from Seminar für Statistik.