[R] leaps and bounds

Mon Aug 16 18:12:38 CEST 1999

On Sun, 15 Aug 1999, Prof Brian D Ripley wrote:

> On Sat, 14 Aug 1999, Troels Ring wrote:
> 
> > Dear friends. 
> 
> > On the Bayesian averaging homepage 
> > http://www.research.att.com/~volinsky/bma.html I found 
> > some S code some of which perhaps may run in R. 
> > There was a call to an algorithm possibly within S but not 
> > supported by R 64.1: "leaps and bounds". I guess it is a minimization
> > step. 
> > Can anyone clarify the algorithm and perhaps even give a pointer to some
> code ?

The algorithm looks for the k best fitting models of each size, using a
branch-and-bound algorithm (for which, see a computer science book). It
involves constructing a search tree and then ruling out large chunks of it
using the fact that removing variables always increases the residual sum
of squares (and possibly other constraints). The model averaging code uses
it to come up with a small (few hundreds) selection of models. These are
further trimmed by rejecting the ones that are much worse than the best
model. The resulting smallish collection of models is then averaged.

> > I guess this may be somewhat above my level of understanding 
> > but I feel much attracted to the concept of model averaging, 
> > as far as I can grasp it, and would like to experiment with it.
> 
> That page has a rather individual (and I think very limited) view of
> Bayesian model averaging. In particular, I think you are looking at links
> to Raftery's work which has been on statlib for some time, afunction called
> bicreg() that calls leaps(), an S version 2 (Blue book) function. A version
> of leaps is available in package leaps on CRAN, and may (I have not tried
> it) enable you to use bicreg.

I think someone has tried this and it didn't work at the time. However,
I don't think it was a serious problem with the leaps() library
implementation, so it may just need minor fiddling.   The glm and survival
functions will not work as they call the Fortran code directly, and R uses
different Fortran code.

> However, as leaps is an algorithm to select the best subset of size k, it
> goes directly against my understanding of model averaging, which involves
> averaging over all subsets and hence in the linear regression context is a
> form of downweighting of regression coefficients. 

I personally feel somewhat less negative about these functions, though I
agree that the theoretical basis for them is thin.  I have used them in a
couple of cases where they did give quite good shrinkage of Cox regression
coefficients and improved prediction, averaging over a few hundred models
(though I didn't compare them to 'proper' model averaging or to
techniques like ridge regression or lasso).  

You might also look at the package "br" in the Statlib S archive, which
uses model averaging for variable selection, smoothing and outlier
rejection. I don't know if it will run under R.

Thomas Lumley
Assistant Professor, Biostatistics
University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._