[R-sig-ME] stepwise model selection (of fixed effects only) using AIC?

Mon Jan 7 21:46:38 CET 2013

Thanks Ben.  

No, drop1 doesn't work:
Error: $ operator not defined for this S4 class

OK, I'll use the backward direction in preference.  

The fixed effect covariates were all selected because there is some evidence in the literature that they may be associated or confounding.  So, I could declare, a priori, that my hypothesised model includes all my hypothesised covariates and their Age interaction terms.  

Would it then be acceptable to the detractors for me to start from that model and simplify it via backwards stepwise model selection to settle on a simpler model?

S

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Bolker
Sent: Monday, 7 January 2013 5:26p
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] stepwise model selection (of fixed effects only) using AIC?

Steve Taylor <steve.taylor at ...> writes:

> 
> Hello mixed modellers,

> I note that one can fit mixed models and compare them by AIC using
>  anova(), as in example(glmer).

> But step() doesn't work on the models produced by glmer() -
> evidently because they are S4 objects, rather than some more
> philosophical objection to doing so.

It's sort of a combination, I think: on the one hand I wouldn't
stop someone who *really* wanted to from doing stepwise model
selection; (1) it's a free country and (2) there are *occasionally*
good reasons to do so.  On the other hand, I don't feel particularly
inspired to put a lot of effort into implementing this capability,
because it's so rarely a good idea.

> I want to select among combinations of fixed effect components only,
>  as one would do with step(glm()).

> My data set consists of two repeated measurements (say, at age 9 and
> age 11) on each of about 900 people. There are about 8 potential
> fixed effect covariates that I wish to evaluate, including
> interaction terms with Age (as a two-level factor).  The only random
> effect I'm assuming is an intercept per participant.

  I *think* that drop1() should work; this will allow you to do
stepwise regression in not more than 8 steps or so.  That is, at
each stage drop1() will tell you which of the remaining terms
in the model will most improve the model.  At worst, you will
have to run drop1(); update(model, . ~ . - worstpredictor) 8
times before you end up at the intercept-only model.  (Hopefully
you will get to stop before then, as none of the dropped-predictor
models will have a lower AIC than the full model.)

> Can anyone help with a way to do this?
> 
> Here's my attempt, ignoring the random effect during model selection.  
> Is this a reasonable thing to do?
> 
> modeldata1 = subset(modeldata, select= -Participant)
> glm0 = glm(Outcome ~ Age, data=modeldata1, family=binomial)
> glm1 = glm(Outcome ~ .*Age, data=modeldata1, family=binomial)
> scope = list(lower=formula(glm0), upper=formula(glm1))
> glm2 = step(glm0, scope, direction='forward')

  I generally prefer backward to forward selection ...

> mm0 = glmer(Outcome ~ (1|Participant) + Age, 
>       data=modeldata, family=binomial)
> mm2 = update(mm0, formula=update(formula(glm2),'. ~ (1|Participant) + .'))
> 
> Interestingly, the results are pretty much the same, from:
> library(effects)
> plot(allEffects(glm2))
> plot(allEffects(mm2))

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models