[R-sig-ME] Reply to subject "Interpretation of lmer output in R"

Wed Dec 14 17:37:41 CET 2011

On Wed, Dec 14, 2011 at 9:58 AM, staffan at myrica.se <staffan at myrica.se> wrote:
> Dear Douglas!
> I hope it is alright to ask you a question directly about glmer.

I have taken the liberty of cc:'ing the
R-SIG-Mixed-Models at R-project.org mailing list on my reply as others
who read that list may have suggestions on both the comparisons and
the manner in which the results can be reported so as to be accessible
to ecologists or ornithologists.

>
> In a quite long discussion during last february with Julia Sommerfeld, there
> was the qesution below and your answer to it:
>
>> 2. Since Sex is not "important",  I compared fm1 with fm2 to test if
>> BreedSuc1 is an important factor:
>>
>> fm2<-lmer(SameSite ~ 1 + (1|Bird), family="binomial")
>> anova(fm2,fm1)
>>
>> Data:
>> Models:
>> fm2: SameSite ~ 1 + (1 | Bird)
>> fm1: SameSite ~ BreedSuc1 + (1 | Bird)
>>         Df    AIC        BIC      logLik      Chisq     Chi Df
>>  Pr(>Chisq)
>> fm2  2     77.617     81.595     -36.808
>> fm1  3     75.518      81.485     -34.759     4.0991          1
>>  0.04291 *
>
> "So if I needed to quote a p-value for the BreedSuc factor this is what
> I would quote."
>
> So I am not sure which P value I now should use in my master thesis. Below I
> post my two results:
>
> First: a model that have been backward eliminated of non-significant terms
> from the model:
>> model1<-glmer(survival ~ 1 + (sociability * boldness * size * density) +
>> (1 |      trial) ,data=data,family=binomial)
> and the result for lets take boldness is:
>>boldness   -6.597e-01   3.173e-01    -2.079  0.0376 *   (so I have a
>> significant P-value)
>
> Second: I do what you imply Julia to do, I compare the two models, the full
> one, model1 with the one excluding boldness:
>
>> anova(model1,model2)
> Data: data
> Models:
> model2: survival ~ 1 + (sociability * boldness * size * density) + (1 |
> model2:     trial) - boldness
> model1: survival ~ 1 + (sociability * boldness * size * density) + (1 |
> model1:     trial)
>        Df     AIC     BIC  logLik  Chisq Chi Df Pr(>Chisq)
> model3 16 167.809 211.305 -67.905
> model2 17 158.319 204.533 -62.159 11.491      1  0.0006995 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> SO, if I would like to show my results in a table, which P-value should I
> present, 0.0376 or 0.0006995? and what else?, the z-value (-2.079) or Chisq
> (-62.159)
> Could you maybe explain your answer a litte bit more in detail, but still
> keep it simple?

The z-value is a rather coarse approximation based on only one model
(the alternative) having been fit to the data.  It is created from the
best guess, based on that single fitted model, as to what the quality
of the fit to the null model will be.  When the null model (the one
without the boldness term) is fit to the data, it turns out that the
fit is much worse than would have been expected based on the earlier
fit.

Because the likelihood ratio test is based on fitting both the
alternative model and the null model the chi-squared statistic is a
better comparison of the difference in quality of the two fits.  If
the approximation based on fitting a single model was reasonable then
the square of the z statistic should be close to the chi-squared
statistic and it isn't.

So the short answer is that the likelihood-ratio test based on
actually fitting both the alternative model and the null model is
preferred to the z-test which is based on fitting only the alternative
model.

Having said that, I am not sure that either of these tests is all that
meaningful due to the nature of the models.  Generally we apply a
hierarchical principle to the terms in a model which means that if we
remove a main-effects term like boldness then we also remove all the
interactions involving that term.  Here you are maintaining the
interactions with boldness but removing the main effect.  Without
strong justification for doing so the form of the model would be
questioned.  Also, a generalized linear mixed model with a binary
response (I am assuming survival is binary) incorporating fourth-order
interactions is likely to be overfitting the data.  I would be much
more conservative in building the model, starting with main effects
and adding interactions as necessary.