[R-sig-ME] ML or REML for LR tests

Fri Aug 29 15:30:40 CEST 2008

On Fri, Aug 29, 2008 at 2:53 AM, Ken Beath <ken at kjbeath.com.au> wrote:
> On 29/08/2008, at 2:47 PM, Austin Frank wrote:
>
>> On Thu, Aug 28 2008, Doran, Harold wrote:
>>
>>>> The likelihood-ratio test approach directly compares these two.
>>>
>>> Since these models differ in their fixed effects, you need REML=FALSE
>>> for the LRT to be meaningful.
>>
>> This is a standard operating procedure that I picked up and accepted on
>> faith when I first started using lmer, before I really knew what I was
>> doing.  It occurs to me that this is the case for much of my
>> understanding of model comparison, so I'd like to check my understanding
>> of the use of LR tests with lmer.  If this is a case of RTFM, please
>> provide a pointer to the relevant Friendly Manual ;)
>>
>> 1) Can anyone offer a reference where the case is made for doing LR
>> tests on models fit by ML (as opposed to REML)?
>>
>
> Any decent mixed models text. Verbeke and Molenberghs "Linear Mixed Models
> for Longitudinal Data" p63 or Pinheiro and Bates "Mixed-Effects Models in S
> and S-Plus" p76.
>
>> 2) Can non-nested ML models with the same number of fixed effects be
>> meaningfully compared with an LR test?  Something like:
>>
>
> No. General principle is for LR test models must be nested.
>
>> --8<---------------cut here---------------start------------->8---
>> data(sleepstudy)
>> set.seed(535353)
>> sleepstudy$Fake <- rnorm(nrow(sleepstudy))
>> m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy, REML=FALSE)
>> m2 <- lmer(Reaction ~ Fake + (1 | Subject), sleepstudy, REML=FALSE)
>> anova(m1, m2)                          # Is this test meaningful...
>>
>> ## When possible, test against superset model
>> m12 <- lmer(Reaction ~ Days + Fake (1 | Subject),
>>           sleepstudy, REML=FALSE)
>> anova(m1, m2, m12)                     # ... or only this one?
>> --8<---------------cut here---------------end--------------->8---
>>
>> 3) Is it the case that LR tests between REML models with different
>> random effects are meaningful?  Does this apply to both nested and
>> non-nested models?

> Maybe, but only for nested (see Q2). Supposedly it works better than ML. The
> significance tests wont be correct but if there is a huge significance level
> then there is probably a random effect. Simulation seems a better idea.

It may help to give a bit of background about what happens in the
anova method for mer objects in the lme4 package.

When fitting a linear mixed model, both the profiled deviance
(negative twice the log-likelihood) and the profiled REML criterion
are evaluated at each iteration during the optimization.  The word
"profiled" means that, although the likelihood or the REML criterion
are defined as functions of the fixed-effects parameters,
$\boldmath{\beta}$, the common scale parameter, $\sigma$, and the
parameters $\boldmath{\theta}$ that determine the relative covariance
matrix $\boldmath{ \Sigma}$ of the random effects, these criteria are
evaluated as a function of $\boldmath{\theta}$ alone, with
$\boldmath{\beta}$ and $\sigma$ at their conditionally optimal values.

The profiled REML criterion has a term that depends on the model
matrix X for the fixed-effects parameters.  If you change the
definition of the fixed-effects you will change the value of that
criterion in a systematic way that does not depend on how well the
respective models fit the observed data.  Thus, differences in the
REML criterion are not a meaningful way to compare two models that
differ in the definition of the fixed-effects.

If you look at the code for the anova method for the mer class you
will see that it always calls logLik with REML = FALSE.  That is, it
extracts the log-likelihood from the profiled deviance even if the
model has been fit according to the REML criterion.  In general that
would be a very bad idea if we were using the deviance and not the
profiled deviance.  You can't count on the deviance at the REML
estimates for the complete parameter vector being close to the optimal
deviance.  However, the profiled deviance at the REML estimates of
$\boldmath{\theta}$ is close to the optimal profiled deviance.

For example,

> fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
> ## profiled deviance at the REML estimate of $\theta$
> dput(d1 <- deviance(fm1, REML = FALSE))
structure(1751.98581110077, .Names = "ML")
> ## optimal value of the deviance
> dput(d2 <- deviance(update(fm1, REML = FALSE)))
structure(1751.93934480597, .Names = "ML")
> ## amount by which the profiled deviance at REML estimate
> ## exceeds the optimal value
> d1 - d2
        ML
0.04646629

As you can see, I have gravitated to using the term "REML criterion"
rather than calling it a deviance or a log-likelihood.  The print
method for mer objects still labels it as the REMLdeviance but I plan
to change that and create an additional extractor called REML to
return that value.  In that case the deviance extractor will always
return the deviance, perhaps after re-optimizing to produce the
optimal deviance.

So the bottom line is that the anova method for mer objects always
produces a likelihood ratio test based on the likelihood.  If you
compare models fit by REML that likelihood ratio statistic will be
somewhat inaccurate but only by a small amount (in all the cases that
I have seen).