[R-sig-ME] Interpretation of lmer output in R
Douglas Bates
bates at stat.wisc.edu
Sat Feb 19 16:04:02 CET 2011
Thank you for your questions and for transferring the discussion to
the R-SIG-Mixed-Models mailing list, as we had discussed. I have also
copied the mailing list for a class on mixed-effects models that I am
teaching.
I particularly appreciate your desire to learn about the model instead
of just quoting a p-value. I often lament to my classes that
statisticians have been far too successful in propagating the idea of
p-values, to the extent that some researchers believe that is all that
is needed to learn about an analysis.
On Sat, Feb 19, 2011 at 3:05 AM, Julia Sommerfeld
<Julia.Sommerfeld at utas.edu.au> wrote:
> Dear Douglas and list members,
>
> Apologies in advance if you might consider my questions as too simple to
> be asking the godfather of lme4 for an answer...thus, please feel free to
> ignore my email or to forward it to someone else.
>
> I'm a PhD student (Australia/Germany) working on tropical seabirds. As
> many of my PhD-collegues, I'm having some difficulties with the analysis
> of my data using lmer (family=binomial). While some say: What do you care
>
> about all the other values as long as you've got a p-value... I do believe
> that it is essential to understand WHAT I'm doing here and WHAT all these
> numbers/values mean.
>
> I've read the Chapters (lme4 Book Chapters) and publications about the use
> of lmer and searched the forums - but I don't find a satisfying answer.
> And I have the feeling that 1. the statistic lecture at my university was
> a joke (sad to say this) 2. that I need a huge statistical/mathematical
> background to fully understand GLMMs.
>
>
> One of the question I would like to answer is:
> Does the previous breeding success influences nest site fidelity?
>
> I have binomial data:
> SameSite=1 means birds use the same site
>
> SameSite=0 means birds change nest site
>
> BreedSuc1=1 Birds were successful in previous breeding season
> BreedSuc1=0 Birds were not successful " " "
>
> Sex= male, female
> Bird= Bird ID
>
> This is my model:
> fm<-lmer(SameSite~BreedSuc1+Sex+(1|Bird), family="binomial")
> where Bird is my random factor (same birds were sampled more than once)
One thing to note is that there are 46 different birds in the 54
observations. Most birds will have just one observation so a random
effect for bird may not be necessary.
> summary(fm)
>
> Generalized linear mixed model fit by the Laplace approximation
>
> Formula: SameSite ~ BreedSuc1 + Sex + (1 | Bird)
> AIC BIC logLik deviance
> 77.38 85.34 -34.69 69.38
> Random effects:
> Groups Name Variance Std.Dev.
> Bird (Intercept) 0.14080 0.37524
> Number of obs: 54, groups: Bird, 46
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -0.3294 0.4890 -0.674 0.5006
> BreedSuc11 1.1988 0.5957 2.012 0.0442 *
> SexM 0.2215 0.5877 0.377 0.7062
this suggests that sex is not an important factor in the model. The
(Intercept) term is close to zero, relative to its standard error, but
we would retain it in the model as explained below.
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Correlation of Fixed Effects:
> (Intr) BrdS11
> BreedSuc11 -0.536
> SexM -0.628 0.065
>
>
> From this summary output I do understand that the Breeding Success has a
> significant effect on nest-site fidelity (p<0.05).
Yes, but ... this p-value should be used as a guide only. As
described below a p-value must be viewed in context. It is not a
property of the Breeding Success factor; it comes from a comparison of
two models and we should bear in mind that these models are before
interpreting this number.
The interpretation of a p-value for a particular coefficient is that
it is an approximation to the p-value we would get from comparing the
model that has been fit to the mode fit without this particular
coefficient. In this case the coefficient corresponds to one of the
terms in the model and I would advocate performing a likelihood ratio
test comparing the two models
fm <- glmer(SameSite~BreedSuc1+Sex+(1|Bird), family="binomial")
fm0 <- glmer(SameSite~Sex+(1|Bird), family="binomial") # the null
hypothesis model
anova(fm0, fm)
Even though the function is called anova it will, in this case,
perform a likelihood ratio test (LRT). It also prints the values of
AIC and BIC if you prefer to compare models according to one of those
criteria but I prefer using the likelihood ratio for nested models.
However, before doing that comparison you should ask yourself whether
you want to compare models that have the, apparently unnecessary term
for Sex in them. The way I would approach the model building is first
to reduce the model to
fm1 <- lmer(SameSite~BreedSuc1+(1|Bird), family="binomial")
You could then compare
anova(fm1, fm)
which I presume will give a large p-value for the LRT, so we prefer
the simpler model, fm1. After that, I would compare
fm2 <- lmer(SameSite ~ 1 + (1|Bird), family="binomial")
anova(fm2, fm1)
to see if the BreedSuc1 factor is an important predictor in its own right.
Note that we don't drop the implicit "(Intercept)" term, even though
it has a high p-value in the coefficient table. The reason is that
the interpretation of the (Intercept) coefficient depends on the
coding of BreedSuc1.
In model fm, the log-odds of site fidelity for a female bird (i.e.
SexM is 0) who was unsuccessful in breeding the previous season (i.e.
BreedSuc1 is 0) is -0.3294, corresponding to a probability of about
42%
> plogis(-0.3294)
[1] 0.4183866
The log-odds of site fidelity for a female bird who was successful in
breeding is -0.3294 + 1.1988, corresponding to a probability of about
70%
> plogis(-0.3294 + 1.1988)
[1] 0.7046208
If you had reversed the meaning of BreedSuc to BreedFail, where 0
indicates no failure at breeding and 1 indicates failure, then the
coefficient would change sign (i.e. the coefficient for BreedFail
would be -1.1988) and the intercept would change to
> -0.3294 + 1.1988
[1] 0.8694
because the reference level would now be a female bird who was
successful in breeding.
Because the interpretation of the intercept depends upon the coding of
other factors, we retain it in the model whenever other terms are
retained.
> But what else can I conclude from this model?
>
> Questions:
>
> 1.Random effects: What does the Random Effect table - the Variance, Std.
> Dev. and Intercept - tells me: Is there a random effect that my model has
> to account for?
First I would remove the apparently unnecessary Sex term then,
ideally, I would check by comparing the fit of the reduced model to
that of a GLM without the random effect for Bird. Unfortunately, I
don't think the definition of deviance for a glm fit is compatible
with that for a model fit by glmer. This is something we will need to
fix. For the time being I would instead examine the "caterpillar
plot" obtained with
dotplot(ranef(fm1, postVar=TRUE))
which represent the 95% prediction intervals for each of the birds.
If these all overlap zero comfortably I would conclude that the random
effect is not needed an fit a glm without a random effect for bird.
> Random effects:
> Groups Name Variance Std.Dev.
> Bird (Intercept) 0.14080 0.37524
> Number of obs: 54, groups: Bird, 46
That estimated standard deviation is fairly large. We would expect a
range of contributions on the log-odds scale of about +/- 2 sd which,
at this point of the logistic curve corresponds to considerable
variability in predicted probabilities for birds with the same
characteristics.
> 2. Fixed Effects: Again the Intercept? Not sure if I understand the
> meaning of it (sorry, explanation in Chapter I also doesn't help much)
Actually in this model it is a bit different from the models described
in chapter 1. I hope the explanation above makes sense. Think of it
as the log-odds of site fidelity for a bird in the "reference group"
where reference group means that all the other variables are a the
zero level.
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -0.3294 0.4890 -0.674 0.5006
> BreedSuc11 1.1988 0.5957 2.012 0.0442 *
>
> SexM 0.2215 0.5877 0.377 0.7062
>
> 3. Meaning of the z-value? Why shall I mention it in te result section?
I would regard the z-value as an approximation. The quantity of
interest is the likelihood ratio test statistic which has a
chi-squared distribution under the null hypothesis (i.e. the term can
be deleted from the model without getting a significantly worse fit).
It happens that this would be a chi-squared distribution with 1 degree
of freedom, which corresponds to the square of a standard normal
distribution. You should see a LRT test statistic close to, but not
exactly the same as, the square of the z value when you compare the
models with and without that term. This is the sense in which the
z-value is an approximation. To me the LRT statistic is more reliable
because it is based upon actually refitting the model.
> 4. Estimate and Std. Error of the fixed effects? How can I tell from these
> values WHAT kind of effect (positiv, negativ?) these parameter have on
> nest-site fidelity? Do birds that were successful during the previous
> breeding success show a higher nest-site fidelity? Remember, I have
> binomial data...
That is described above. If you want the estimate of the site
fidelity for bird with certain characteristics you evaluate the
corresponding combination of coefficients and apply plogis to the
result.
> I would highly appreciate your feedback and/or suggestions of
>
> papers/chapters I could read for a better understanding of the output.
>
> Best regards,
>
>
> Julia
I hope this helps.
More information about the R-sig-mixed-models
mailing list