[R-sig-ME] Teaching Mixed Effects
bolker at UFL.EDU
Tue Jan 20 15:59:01 CET 2009
Some comments (others will certainly have more to add)
From: r-sig-mixed-models-bounces at r-project.org [r-sig-mixed-models-bounces at r-project.org] On Behalf Of Andrew Beckerman [a.beckerman at sheffield.ac.uk]
Sent: Tuesday, January 20, 2009 6:47 AM
To: R Models Mixed
Subject: [R-sig-ME] Teaching Mixed Effects
Dear R-Mixed people -
I am about to embark on a day of attempting to teach some aspects of
mixed models using R to PhD students. I was wondering if anyone would
be willing to indulge in this summary below, developed through reading
threads on R-Mixed and R-Help over the past few months, and vet my
list of issues/questions/topics (4) associated with mixed models?
Let me reduce any rising blood pressure by saying that I understand
(possibly) and accept why there are no p-values in lmer, and NONE of
the comments/questions below are about why lmer does not produce
sensible df's and p-values to calculate significance (Phew).
First, a technical question:
Based on these two threads:
IS mcmcsamp() broken for "complicated" random effects? Is it in good
enough shape to teach for "simple" gaussian mixed models, to
demonstrate the principle?
BMB: good question. I'd like to know, although I would
guess that it would be OK for demonstrating the principle.
You could try it out and see if it's sensible ...
Now, here is what I am possibly going to talk about.....
0) Rule number 1 is to design experiments well, and aim for
orthogonal, well replicated and balanced designs. If you get data
that conforms to all of that, old school F-ratio's CAN be used. If
not, see 1-4 below (we will assume that Rule number 1 will be broken).
BMB: good idea.
1) It is agreed that the Laplacian methods for estimating terms and
"likelihoods" in mixed effects models is considered most reliable
(accurate and precise).
BMB: I believe (but stand ready to be corrected) that
PQL vs Laplace vs adaptive Gauss-Hermite quadrature
(AGHQ) is an issue for GLMMs, not so much for LMMs.
AGHQ is generally even better (but slower) than Laplace,
which is a special case.
R (lme4) and ADMB model builder use these
methods. SAS nlmixed does, but SAS proc mixed does not appear to.
STATA can. Genstat does/can (see final note below**).
BMB: SAS PROC MIXED does, as of version 9.2
2) It is agreed that the appropriate test for fixed effects in mixed
models should be between nested models. However, there is no
agreement as how to characterise the distributions that would be used
to generate p-values. This is the crux of the Bates et al argument:
Likelihood Ratio Tests, Wald tests etc all need to assume a
distribtion and some degrees of freedom. But, in many mixed models,
the distribution need not conform to any of our standard ones (t,F,
Chi-square etc), especially when the number of subjects in the random
effects is small. Moreover, the relationship between fixed and random
effects means that it is nearly impossible, and perhaps not worthwhile
to calcuate what might be appropriate "degrees of freedom".
BMB: and specifically, if you happily use the anova() method
to calculate LRT it will do so -- but Pinheiro and Bates 2000 expressly
warn against the results/show that they can be "anticonservative"
for small sample sizes. If you have huge sample sizes (i.e.
you're not a field ecologist) then LRT may be OK (I seem to remember
Bates using it without comment in an analysis of a (large)
Bangladeshi arsenic data set).
2.1) However, Bates et al have mentioned the restricted likelihood
ratio test. There is a package in R implementing some of these tools
(RLRsim), but these appear to be limited to and or focused on tests of
BMB: You can test fixed effects, apparently, but only *in combination with*
a test of the random effect (the null hypothesis is always a model
without random effects). Also limited to LMMs, and a single
2.2) What some "other" packages do: SAS can produce wald tests and
LRT's if you want, and can implement the kenward-rogers adjustement.
BMB: Kenward-Roger ! (not Rogers)
There is some theory behind the K-R, but it is still not dealing with
the crux of the problem (see 2). Genstat uses wald tests and warns
you that with small numbers of subjects, these are not reliable.
Genstat is also experimenting with HGLM by Nelder and Lee (see **)
2.3) "Testing" random effects is considered inappropriate (but see 2.1
BMB: I don't think this is necessarily true. Admittedly it is a point
null hypothesis (variance will never be _exactly_ zero), but I
can certainly see cases ("does variation among species contribute
significantly to the overall variance observed"?) where one would
want to test this question. This is a bit murky but I think the
distinction is often between random effects as part of an experimental
design (no point in testing, not interesting) and random effects
as observational data.
3) Because of 2, there is the resounding argument that bayesian and or
simulation/bootstrapping tools be used to evalaute fixed effects.
BMB: I don't know about "resounding", but OK. Probably
the best option.
Current methods proposed and coded, but in various states of
mcmcsamp() and HPDinterval() from lme4 + baayen *.fnc's from languageR,
BUGS and winBugs,
RH Baayen's simulation tools (e.g. page 307 method)
Andrew Gelman and Jennifer Hill's tools (e.g. sim() method from
Ben Bolker's suggestions in this list for glmm's (thread: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001459.html)
3.1) These all evalaute "simple" tests of whether beta's and intercept
are different than 0, and are linked to the contrasts. There is no
emerging method equivalent to a LRT (but see 2.1 and **Final Note
BMB: I think if you want to calculate the parametric bootstrap/
null-hypothesis simulation of the change in deviances between
nested models, it's actually reasonably straightforward. See
examples on glmm.wikidot.com , especially
[geared toward GLMMs but even easier for LMMs]
4) Andrew Gelman et al also suggest AIC based methods and model
averaging for model inference, given constant random effects. I think
their argument about AIC is that if the "likelihood" is estimated
well, relative differences in AIC will be constant, irrespective of
any adjustement made to numbers of paramters used in calculating AIC:
i.e. as long as the random effects structure stays the same, the
relative differences between nested models will not change if the
number of paramters is estimated consistently. These methods still do
not produce p-values.
BMB: and AIC is an asymptotic method anyway, like
LRTs ... which means it is likely to have the same problems, but I don't
think anyone has evaluated them. If you want to use the finite-size
corrections (AICc) then you are back in the situation of guessing
at residual degrees of freedom ...
**Final Note Below - I have noticed a relative lack of discussion of
Nelder et al's H-likelihood and their methods to generate a general
method for all heirarchical modelling (HGLM?!). Would anybody be able
to comment? A recent paper (http://www.springerlink.com/content/17p17r046lx4053r/fulltext.pdf
) that is somewhat beyond my skills, indicates the use of Laplace
methods to estimate likelihoods in heirarchical models and various
capacity for model inference.
BMB: I think HGLMs are a very promising way of *estimating*
the parameters of GLMMs (I too wonder why they don't seem to
be discussed much), but I don't think they get us any farther forward
Thanks again, in advance, to anyone who took this on..... apologies
for any glaring errors or assignment of ideas to people incorrectly.
Dr. Andrew Beckerman
Department of Animal and Plant Sciences, University of Sheffield,
Alfred Denny Building, Western Bank, Sheffield S10 2TN, UK
ph +44 (0)114 222 0026; fx +44 (0)114 222 0002
R-sig-mixed-models at r-project.org mailing list
More information about the R-sig-mixed-models