[R-sig-ME] lmer models-confusing results - more information!

Fri Dec 4 18:29:36 CET 2009

Manuel Morales wrote:
> On Thu, 2009-12-03 at 14:43 -0500, Ben Bolker wrote:
>> Gwyneth Wilson wrote:
>>> I have been running lmer models in R, looking at what effects
>>> reproductive success in Ground Hornbills (a South African Bird). My
>>> response variable is breeding success and is binomial (0-1) and my
>>> random effect is group ID. My response variables include rainfall,
>>> vegetation, group size, year, nests, and proportion of open woodland.
>>>
>>>
>>> I have run numerous models with success but I am confused about what
>>> the outputs are. When I run my first model with all my variables (all
>>> additive) then i get a low AIC value with only a few of the variables
>>> being significant. When i take out the varaibles that are not
>>> significant then my AIC becomes higher but I have more significant
>>> variables! When I keep taking out the unsignificant variables, I am
>>> left with a model that has nests, open woodland, and group size as
>>> being extremely significant BUT the AIC is high! Why is my AIC value
>>> increasing when I have fewer varaibles that are all significant and
>>> seem to be best explaining my data? Do i look at only the AIC when
>>> choosing the 'best' model or do I look at only the p-values? or both?
>>> The model with the lowest AIC at the moment has the most variables
>>> and most are not significant?
>>    This happens a lot when you have correlated variables: although I
>> don't agree with absolutely everything it says, Zuur et al 2009 is a
>> good start for looking at this. When you have correlated variables, it's
>> easy for them collectively to explain a lot of the pattern but
>> individually not to explain much.
>>
>> Zuur, A. F., E. N. Ieno, and C. S. Elphick. 2009. A protocol for data
>> exploration to avoid common statistical problems. Methods in Ecology and
>> Evolution. doi: 10.1111/j.2041-210X.2009.00001.x.
>>
>>   In general, you should *either* (1)fit all sensible models and
>> model-average the results (if you are interested in prediction) or (2)
>> use the full model to evaluate p-values, test hypotheses etc. (providing
>> you have _already_ removed correlated variables).  In general (although
>> Murtaugh 2009 provides a counterexample of sorts), you should **not**
>> select a model and then (afterwards) evaluate the significance of the
>> parameters in the model ...
> 
> Is this in the context of non-nested models? Otherwise, a very common
> scenario is to test interaction terms first and then remove from the
> model if not significant (i.e., to test the significance of main
> effects).

  Yes.  I think removing interactions is technically violating the
"don't select models and then test them" rule, but it also seems
reasonable to remove a _small_ number of non-significant interactions on
the grounds of interpretability.  (I believe Pinheiro and Bates do this
to some extent in the example in PB2000).

  cheers
   Ben