[R] regsubsets (Leaps)

Mon Jun 4 00:56:50 CEST 2012

On Sat, Jun 2, 2012 at 3:19 AM, farmedgirl <ksteinmann at cdpr.ca.gov> wrote:
> Hi
> i need to create a model from 250 + variables with high collinearity, and
> only 17 data points (p = 250, n = 750). I would prefer to use Cp, AIC,
> and/or BIC to narrow down the number of variables, and then use VIF to
> choose a model without collinearity (if possible).  I realize that having a
> huge p and small n is going to give me extreme linear dependency problems,
> but I *think* these model selection criteria should still be useful?
>
> I have currently been running regsubsets for over a week with no results. I
> have no idea if R is still working, or if the computer is hung. I ran
> regsubsets on a smaller portion of the data, also with linear dependency
> problems, and got results. However, the hourglass continues its endless
> spiraling with the full dataset.
>
> I am running the following on Windows 7
> library(leaps)
> m_250<-regsubsets(Y~., data=model2, nbest=1, really.big=TRUE)
>
> (NOTE: The ~ is a tilda, not a dash, in the regression statement above: Y~.)
>
> Does anyone have any opinions on:
> 1) is R likely to still be running, even after a week, or should i just shut
> it down?

It's likely to be running for years.  2^250 is a large number, even
with the branch-and-bound algorithm to cut it down.

> 2) am i doing something wrong with regsubsets?

Yes.  At the very least, set nvmax to something reasonable.  You
certainly don't want to find a model with 243 variables, so don't
waste time looking for one.

>
> 3) is there a better option than regsubsets,

Almost certainly.  regsubsets() is pretty much useless as a way of
selecting a single model, unless perhaps when p is very small.  It was
produced as a way of viewing a large collection of best models, as in
the example for the plot() method, by setting nbest fairly large

  -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland