[R] step, leaps, lasso, LSE or what?

Prof Brian D Ripley ripley at stats.ox.ac.uk
Fri Mar 1 08:26:11 CET 2002


On Thu, 28 Feb 2002, Frank, Murray wrote:

> Hi,
>
> I am trying to understand the alternative methods that are available for
> selecting
> variables in a regression without simply imposing my own bias (having "good
> judgement"). The methods implimented in leaps and step and stepAIC seem to
> fall into the general class of stepwise procedures. But these are commonly
> condemmed for inducing overfitting.

There are big differences between regression with only continuous variates,
and regression involving hierarchies of factors.  step/stepAIC include the
latter, the rest do not.

A second difference is the purpose of selecting a model.  AIC is intended
to select a model which is large enough to include the `true' model, and
hence to give good predictions.  There over-fitting is not a real problem.
(There are variations on AIC which do not assume some model considered is
true.)   This is a different aim from trying to find the `true' model or
trying to find the smallest adequate model, both aims for explanation not
prediction.  AIC is often criticised (`condemmed') for not being good at
what it does not intend to do.   [Sometimes R is, too.]

Shrinkage methods have their advocates for good predictions (including me),
but they are a different class of statistical methods, that is *not*
regression.  They too have issues of selection, usually how much to shrink
and often how to calibrate equal shrinkage across predictors.  In ridge
regression choosing the ridge coefficient is not easy, and depends on the
scaling of the variables. In the neural networks field, shrinkage is widely
used.

> In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning"
> chapter 3,
> they describe a number of procedures that seem better. The use of

I think that is a quite selective account.

> cross-validation
> in the training stage presumably helps guard against overfitting. They seem
> particularly favorable to shrinkage through ridge regressions, and to the
> "lasso". This
> may not be too surprising, given the authorship. Is the lasso "generally
> accepted" as
> being a pretty good approach? Has it proved its worth on a variety of
> problems? Or is
> it at the "interesting idea" stage? What, if anything, would be widely
> accepted as
> being sensible -- apart from having "good judgement".

Depends on the aim.  If you look at the account in Venables & Ripley you
will see many caveats about any automated method: all statistical problems
(outside textbooks) come with a context which should be used in selecting
variables if the aim is explanation, and perhaps also if it is prediction.
You should use what you know about the variables and the possible
mechanisms, especially to select derived variables.  But generally model
averaging (which you have not mentioned and is for regression a form of
shrinkage) seems to have most support for prediction.

> In econometrics there is a school (the "LSE methodology") which argues
> for what amounts to stepwise regressions combined with repeated tests of
> the properties of the error terms. (It is actually a bit more complex
> than that.) This has been coded in the program PCGets:
> (http://www.pcgive.com/pcgets/index.html?content=/pcgets/main.html)

Lots of hyperbolic claims, no references.  But I suspect this is `ex-LSE'
methodology, associated with Hendry's group (as PcGive and Ox are), and
there is a link to Hendry (who is in Oxford).

> If anyone knows how this compares in terms of effectiveness to the methods
> discussed in
> Hastie et al., I would really be very interested.

It has a different aim, I believe.  Certainly `effectiveness' has to be
assessed relative to a clear aim, and simulation studies with true models
don't seem to me to have the right aim.  Statisticians of the Box/Cox/Tukey
generation would say that effectiveness in deriving scientific insights
was the real test (and I recall hearing that from those I named).

Chpater 2 of my `Pattern Recognition and Neural Networks' takes a much
wider view of the methods available for model selection, and their
philosophies.  Specifically for regression, you might take a look at Frank
Harrell's book.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list