[R-sig-ME] Mixed effects model selection

Wed Mar 24 16:01:47 CET 2010

Hello all, I'll get right into it:

I am trying to perform model selection starting from an overly parameterized
model of the form:

mdl1<- lmer(Y ~ Sa + Sb + c + d + e + f + (c + d + e + f|SpeciesID), REML = F)

where Sa and Sb are species-level predictors and the others are population level
predictors, so can potentially vary within species (hence their inclusion as
slopes in the random effects term).

In Zuur et al 2009 it is suggested that one starts with a full model like this
and first perform selection on the random effects, then move on to do selection
on the fixed effects.

The procedure I have been using is to remove one item at a time from the random
effects, in effect plotting four new models, each missing one of the slope
parameters in the random effects, then using likelihood ratio tests (LRT) to
compare them to the full model. I throw out the variable with the lowest
non-significant LRT test statistic (realizing that the p-values are
conservative). I then repeat this procedure until all variables are
"indispensable" as defined by the LRT.

I do this with the fixed effects after selecting for the random effects. The
justification in Zuur for doing the selection on the random effects first, with
all possible fixed effects in there, is that any variation potentially explained
by the fixed effects should stay there and the random effects then give you what
doesn't show up in the fixed effects.

First of all, does this model selection procedure sound reasonable?

Now, when I complete this procedure on my model, the resulting best model looks
like this:

mdl.final<- lmer(Y ~ Sa + Sb + c + d + (e + f|SpeciesID), REML = F)

My question is whether you can have variables in the slope portion of the random
effects that are not in the fixed effects, and if so what is the interpretation
of their "influence"?

My understanding was that the slopes in the random effects are deviations from
the fixed effects parameter estimates, which seems to only make sense if the
variable also shows up as a fixed effect.

Any assistance would be hugely appreciated, even if it is just telling me I'm a
big dummy. I have looked through a lot of other literature, including Pinheiro
and Bates 2000 and haven't found anywhere that explicitly deals with selection
for models with this many variables and how it should be approached (except
Zuur, whose method gives me the above result).

Thanks so much!

Regards,

Ham