[R-sig-ME] Mixed effects model selection

Fri Mar 26 04:34:40 CET 2010

 <hamish.wilman at ...> writes:

> I am trying to perform model selection starting from an overly parameterized
> model of the form:
> 
> mdl1<- lmer(Y ~ Sa + Sb + c + d + e + f + (c + d + e + f|SpeciesID), 
> REML = F)
> 
> where Sa and Sb are species-level 
> predictors and the others are population level
> predictors, so can potentially vary within species (hence their inclusion as
> slopes in the random effects term).
> 
> In Zuur et al 2009 it is suggested that one starts with a full 
> model like this
> and first perform selection on the random effects, 
> then move on to do selection
> on the fixed effects.
> 
> The procedure I have been using is to remove one item at a 
> time from the random
> effects, in effect plotting four new models, each missing one of the slope
> parameters in the random effects, then using likelihood ratio tests (LRT) to
> compare them to the full model. I throw out the variable with the lowest
> non-significant LRT test statistic (realizing that the p-values are
> conservative). I then repeat this procedure until all variables are
> "indispensable" as defined by the LRT.
> 
> I do this with the fixed effects after selecting for the random effects. The
> justification in Zuur for doing the selection on the random effects first, 
> with all possible fixed effects in there, is that any variation 
> potentially explained
> by the fixed effects should stay there and the 
> random effects then give you what
> doesn't show up in the fixed effects.
> 
> First of all, does this model selection procedure sound reasonable?

  While I generally like Zuur et al's approach, the one place where
I differ fairly strongly is in model selection.  There is now a *lot*
of literature out there that explains the dangers of stepwise approaches;
Harrell's 2001 book is probably the most coherent and comprehensive.
My compromise is that I am willing to discard terms (a) to allow the
model to fit (i.e., to deal with convergence and other numeric problems)
and (b) to simplify interpretation (i.e., getting rid of higher-order
interaction can make interpretation of everything else much easier).

  If your full model appears to fit adequately (i.e. it doesn't
break the fitting software), and if you want to test hypotheses
about the effects of the individual effects, then I would say you
should not do any model reduction (others may disagree).

> 
> Now, when I complete this procedure on my model, 
> the resulting best model looks
> like this:
> 
> mdl.final<- lmer(Y ~ Sa + Sb + c + d + (e + f|SpeciesID), REML = F)
> 
> My question is whether you can have variables in 
> the slope portion of the random
> effects that are not in the fixed effects,
> and if so what is the interpretation
> of their "influence"?

   It only makes sense in the situation where the hypothesis
that the species-level average of e and f across populations
is *exactly* zero, which seems somewhat dubious.  This is somewhat
analogous to dropping an intercept term from a regression while
retaining the slope: it's not impossible to find situations where
a regression that goes exactly through the origin is a reasonable
hypothesis, but it's unusual.