[R-sig-ME] How do you report lmer results?
Ben Bolker
bbolker at gmail.com
Thu Jul 28 15:37:54 CEST 2011
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 07/27/2011 04:01 AM, Luke Duncan wrote:
> Dear R-Gurus
>
> I am a PhD student from South Africa working on chimpanzee behaviour.
> I am looking at patterns of shade utilization and am using generalized
> linear mixed models to examine the effects of various factors on
> whether chimpanzees choose to spend time in the sun or shade.
OK. Are these binary (sun vs shade) outcomes?
I
> realise that the lme4 package and the outputs of the lmer functions
> have been discussed ad nauseum but I have been reading through many of
> them and am finding it all extremely confusing. I have used programs
> like Statistica to run glm's with no random factors but now that I
> have to include random effects, this is no longer an option. Thus I
> have turned to R (and hence I am a complete R virgin).
This is fine; you should mention that you previously tried on r-help
(this is a more appropriate list anyway).
>
> What I would like to know is the following. What is the accepted
> general consensus on how to report the outputs of a lmer model? What
> is the currently accepted method for determining whether fixed effect
> parameters are significant in predicting the outcomes of the model
> (LHR, AIC, Wald X^2...?)? While I recognise that the "Pr(>|z|)" value
> is not a definitive p-value (rather an approximation), can one treat
> it loosely as an 'estimated' p-value?
Hard to say. You could try searching google scholar for references to
"lmer" or "lme4", for a start, although there aren't a lot of hits
there. I would recommend Bolker et al _Trends in Ecology and Evolution_
2008, too (of course); it might also be useful to look in Zuur's book on
mixed models.
>
> My model comprises 2 categorical predictor variables (Time of day:
> 'Time'; Available amount of shade, coded as a three-way
> classification: 'Tertile'), two continuous predictor variables
> (maximum temperature: 'Max'; minimum temperature: 'Min') and three
> random effects (Which experimental dataset the data were derived from:
> 'Exp'; Which individual chimpanzee was observed: 'Indiv'; Which
> area/zone of the enclosure they occupied at the time of observation:
> 'Zone'). These are the outputs that I have generated thus far using
> LHR testing. How should I be interpretting and reporting these
> outputs?
>
> Generalized linear mixed model fit by the Laplace approximation
> Formula: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone)
> + Max + Min
> Data: sdata
> AIC BIC logLik deviance
> 215.5 259 -95.77 191.5
> Random effects:
> Groups Name Variance Std.Dev.
> Zone (Intercept) 2.6596e-01 5.1571e-01
> Indiv (Intercept) 0.0000e+00 0.0000e+00
> Exp (Intercept) 2.9021e-11 5.3871e-06
One thing that's clear here is that your model is a bit overfitted --
you are getting zero or near-zero variances for both 'indiv' and 'exp'.
That doesn't necessarily mean anything is wrong, but you might want to
test whether the model performs differently if you drop these random
effects, or whether it gives significantly different results with
different combinations of random effects (again, unlikely but worth
checking).
It's also not surprising that you get zero variance for Exp, with only
two levels. The numbers of random-effects levels here are very small --
7 or 8 is probably the *minimum* you could expect to work.
Opinions differ (see <http://glmm.wikidot.com/faq>), but I would suggest
you make Exp a fixed factor ...
> Number of obs: 276, groups: Zone, 8; Indiv, 7; Exp, 2
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -2.15725 1.58304 -1.363 0.17297
> Time11h00 0.96362 0.40956 2.353 0.01863 *
> Time12h00 1.57906 0.49033 3.220 0.00128 **
> Time13h00 1.58951 0.40705 3.905 9.43e-05 ***
> Time14h00 1.07939 0.53876 2.003 0.04513 *
> TertileLOW -1.40906 0.53761 -2.621 0.00877 **
> TertileMEDIUM -1.24862 0.57396 -2.175 0.02960 *
> Max 0.10122 0.08611 1.175 0.23985
> Min 0.13439 0.10292 1.306 0.19162
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Correlation of Fixed Effects:
> (Intr) Tm1100 Tm1200 Tm1300 Tm1400 TrtLOW TMEDIU Max
> Time11h00 0.056
> Time12h00 0.258 0.447
> Time13h00 0.115 0.510 0.486
> Time14h00 -0.049 0.318 0.276 0.370
> TertileLOW -0.146 -0.119 -0.215 -0.236 -0.096
> TertlMEDIUM -0.128 0.024 -0.145 -0.155 -0.224 0.707
> Max -0.914 -0.162 -0.277 -0.198 -0.084 -0.025 -0.022
> Min 0.178 0.074 -0.023 0.105 0.244 -0.101 -0.077 -0.463
>
>> anova(m1,m2)
> Data: sdata
> Models:
> m2: prop ~ Time + (1 | Exp) + (1 | Indiv) + (1 | Zone) + Max + Min
> m1: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) +
> m1: Max + Min
> Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
> m2 10 216.72 252.92 -98.359
> m1 12 215.55 258.99 -95.773 5.1721 2 0.07532 .
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>> anova(m1,m3)
> Data: sdata
> Models:
> m3: prop ~ Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) + Max +
> m3: Min
> m1: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) +
> m1: Max + Min
> Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
> m3 8 226.11 255.08 -105.057
> m1 12 215.55 258.99 -95.773 18.567 4 0.0009556 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>> anova(m1,m4)
> Data: sdata
> Models:
> m4: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) +
> m4: Min
> m1: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) +
> m1: Max + Min
> Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
> m4 11 214.81 254.64 -96.407
> m1 12 215.55 258.99 -95.773 1.2672 1 0.2603
>
>> anova(m1,m5)
> Data: sdata
> Models:
> m5: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) +
> m5: Max
> m1: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) +
> m1: Max + Min
> Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
> m5 11 215.22 255.05 -96.613
> m1 12 215.55 258.99 -95.773 1.6792 1 0.195
>
> As I understand this output, the only significant predictor in the
> model appears to be time of day. But, I don't really know how this
> should be reported. Can you point me to some papers or examples where
> lmer outputs have been reported formally? Any help that you could
> offer would be MOST appreciated.
I would try drop1() [works with version ...-40 of lme4, I believe] for
more convenient output.
What you are doing is reasonable, although likelihood ratio test
statistics are probably going to be somewhat liberal (anticonservative)
for "small" sample sizes. *If* all the predictors were represented
equally in all the random-effect levels (i.e. individuals, zones,
experimental blocks all had similar levels of Tertile, Min/Max temp,
time) then this would be close to a randomized-block design, and in the
classical sense you would have a fairly large residual N (as opposed to
a nested design, where at least some of the treatments don't differ
within blocks). LRT is probably appropriate with a caveat. If you are
paranoid, use parametric bootstrapping (see the examples for
?"simulate-mer").
>
> Sincerely (in desperation)
>
> Luke Duncan
>
> PhD Candidate
> School of Animal, Plant and Environmental Sciences
> University of the Witwatersrand
> Johannesburg, South Africa
>
> +27 11 717 6452
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk4xZjIACgkQc5UpGjwzenM5QACfSfnnneERl3Q4NN936N8i2/cl
TM8AniEQeCArTkuSf18p8qoTepFdEVQC
=Xc5e
-----END PGP SIGNATURE-----
More information about the R-sig-mixed-models
mailing list