[R-sig-ME] stepwise model selection (of fixed effects only) using AIC?

John Maindonald john.maindonald at anu.edu.au
Tue Jan 8 03:16:20 CET 2013


Re stepwise or other variable selection approaches with lm()
[but the same issues arise more generally, including with
multi-level models), the function bsnVaryNvar() that is in more
recent versions of our DAAG package may be of some interest.
Just try running 

bsnVaryNvar(method='forward')
bsnVaryNvar(method='backward')
bsnVaryNvar()   ## Exhaustive selection

The default is to select the 'best' 3 variables from a number of
predictors that is varied between 3 and 50, with data in which
the predictors are independent gaussian noise, as is the outcome
variable.  When the best 3 variables are selected out of a number
that is in the region of 15 to 20 or so, the averaging method used 
by our function is likely to give an 'average' notional p-value that
has dropped below 0.05   There are of course ways to account
for the selection bias, but they are non-trivial. 

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 08/01/2013, at 10:31 AM, Diego Pujoni <diegopujoni at gmail.com> wrote:

> The default of the lmer is REML=TRUE, but the anova default is REML=FALSE
> see
> http://tolstoy.newcastle.edu.au/R/e6/help/09/04/11789.html
> 
> The author prevents not only step(glmer()), but any type of stepwise
> selection, including step(glm()). For the author the model has to come
> before the data (a priori hypothesis) and the data has to bring evidence to
> accept or refuse this a priori hypothesis. Look for the best model that
> fits the data is considered "data dredging" by the author and do not agree
> with the "phylosophy" of the AIC (the existence of an infinite dimensional
> real model).
> 
> Please, this is not my opinion, but the author's. I'm still studing if I
> agree or not with it. But as I see in the papers, this kind of analysis are
> becoming more and more used.
> 
> A hug
> 
> 
> 2013/1/7 Steve Taylor <steve.taylor at aut.ac.nz>
> 
>> Obrigado, Diego.  Yes I have studied a little bit of information theory,
>> tho my recollections are hazy.
>> 
>>> you can not compare combinations of fixed effects of class "mer" with
>> REML = TRUE.
>> Curious then, that that's the default value, and that the default anova()
>> does precisely that by comparing two models differing only in the fixed
>> effects included.
>> 
>> I'm aware of the objections, such as the danger of spurious relations.
>> But I cannot see why they prevent step(glmer()) when step(glm()) has been
>> a standard feature in R for many years.  The real reason seems to be the
>> fact that methods in package:stats don't work with S4 objects.
>> 
>> With my sample size, I think the difference between AIC and AICc is
>> negligible.
>> 
>> cheers,
>>    Steve
>> 
>> -----Original Message-----
>> From: r-sig-mixed-models-bounces at r-project.org [mailto:
>> r-sig-mixed-models-bounces at r-project.org] On Behalf Of Diego Pujoni
>> Sent: Tuesday, 8 January 2013 2:04a
>> To: r-sig-mixed-models at r-project.org
>> Subject: Re: [R-sig-ME] stepwise model selection (of fixed effects only)
>> using AIC?
>> 
>> Hi Steve, have you heard about Information-Theoretic Approach? It uses the
>> value of AIC (or AICc) to choose the best hypothesis among many a priori
>> hypothesis. In Anderson (2008) "Model Based Inference in the Life Sciences"
>> we see recomendations against stepwise (or all possible models) because
>> this can lead easily to spurious relations. The author recommend to create
>> several a priori hypothesis (models), using knowledge about the system and
>> then use the AICc to look for the best of them. Another thing that you have
>> to pay attention is the fact that you can not compare combinations of fixed
>> effects of class "mer" with REML = TRUE.
>> 
>> A hug
>> 
>> --
>>                                               Diego PJ
>> 
>>        [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
> 
> 
> 
> -- 
>                                               Diego PJ
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



More information about the R-sig-mixed-models mailing list