[R-sig-eco] model selection AIC

Kingsford Jones kingsfordjones at gmail.com
Tue Oct 14 18:40:24 CEST 2008


Hi Paul,

I don't know of a function that does all-subsets selection via AIC
(step performs stepwise selection).  But if you're fitting linear
models, selection via AIC and Cp will be similar, or the same (there
is discussion of this in MASS -- I believe the two measures differ by
an additive constant when sigma^2 is known).

However, all-subsets regression suffers from the same problems that
stepwise procedures do.  Frank Harrell's "Regression Modeling
Strategies" discusses these problems at length (see the bottom of this
email for a list posted by Frank to sci.stat.consult in the mid-90's).
 His book also offers various solutions, and he provides the Design
and Hmisc packages to implement his suggestions.

Also, although it's always preferrable to use subject area knowlege to
select models (and may be essential if your goal is interpretation
rather than prediction), there are some relatively safe methods of
automating the process offered in the lars package (safe in terms of
reducing the chance of overfitting).

hth,

Kingsford Jones



Here are some of the problems with stepwise variable selection.

1. It yields R-squared values that are badly biased to be high.

2. The F and chi-squared tests quoted next to each variable on the
printout do not have the claimed distribution.

3. The method yields confidence intervals for effects and predicted
values that are falsely narrow (See Altman and Anderson, 1989,
Statistics in Medicine).

4. It yields p-values that do not have the proper meaning, and the
proper correction for them is a difficult problem.

5. It gives biased regression coefficients that need shrinkage (the
coefficients for remaining variables are too large; see Tibshirani,
1996).

6. It has severe problems in the presence of collinearity.

7. It is based on methods (e.g., F tests for nested models) that were
intended to be used to test prespecified hypotheses.

8. Increasing the sample size doesn't help very much (see Derksen and
Keselman, 1992).

9. It allows us to not think about the problem.

10. It uses a lot of paper.

"All possible subsets" regression solves none of these problems.




On Tue, Oct 14, 2008 at 9:50 AM, Dave Hewitt <dhewitt37 at gmail.com> wrote:
>
>
>
>> I am a relatively new user of R, and I am trying to select a model via
>> all-subsets regression.
>>
>> I am curious about the 'leaps' and 'regsubsets' functions in the 'leaps'
>> package...
>>
>> Is there a way to use AIC as the 'method' in these functions, rather
>> than Cp, Rsquared, or adjusted Rsquared?
>>
>> I know you can run AIC on the models that leaps or regsubsets pick after
>> the fact, but can you actually make AIC the criterion?
>>
>
> Hey Paul,
>
> I think the first step with this would be to read Chapter 5, section 5.3 in
> Burnham and Anderson (2002; Model Selection and Multimodel Inference).
>
> Although I hesitate to recommend it (because of what is in B&A Ch. 5), what
> you seem to be looking for is the step() function in the base stats package.
>
> --
> View this message in context: http://n2.nabble.com/model-selection-AIC-tp1330518p1333126.html
> Sent from the r-sig-ecology mailing list archive at Nabble.com.
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>



More information about the R-sig-ecology mailing list