[R-sig-ME] ML or REML for LR tests

Wed Sep 10 15:37:56 CEST 2008

On Wed, Sep 10, 2008 at 1:49 AM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
> Douglas -
> I am taking up your reply to the question that Austin
> Frank asked on August 29.  I'd like to check that my
> understanding of the reply is correct.

> Looking at page 59 of Pinheiro & Bates.  \theta is a
> parameterization of the ratio of the inverse of the variance-
> covariance matrix of the random effects, multiplied by \sigma^2,
> e.g., in a model with 3 levels of random effects,
> \theta = (\sigma/\sigma_1, \sigma/\sigma_2) might do the job.

More or less.  In lme4 the parameterization is the inverse of what you
have written.  That is, individual components of \theta are of the
form \sigma_1/\sigma. This change was made because we want to allow
for \sigma_1 to be zero whereas it is not important for \sigma to be
allowed to be zero.

> Both the profiled deviance and the profiled REML criterion
> are evaluated at each step of the optimization, albeit
> at the current REML estimate of \theta.  (The ML deviance
> can be determined without explicitly calculating the ML
> estimates of the fixed and random effects.)

I'm not sure what the part about "at the current REML estimate of
\theta" means.  For a given value of \theta the profiled deviance and
the REML criterion can both be evaluated.  The REML estimate of \theta
optimizes the profiled REML criterion.  The ML estimate of \theta
optimizes the profiled deviance.  The value of the profiled deviance
at the REML estimate of \theta will be greater than the profiled
deviance at the ML estimate but usually the difference in these
deviances is very small.

> It makes little difference whether one evaluates the
> likelihood deviance at the REML estimate of \theta or at
> the ML estimate.  More importantly for the anova table,
> the difference in deviances is not much altered.

Exactly.

> Thanks in anticipation

Thanks for the clarification.

> The reply was:
>
> (From Doug Bates - 29 August 2008)
>
> It may help to give a bit of background about what happens in the
> anova method for mer objects in the lme4 package.
>
> When fitting a linear mixed model, both the profiled deviance
> (negative twice the log-likelihood) and the profiled REML criterion
> are evaluated at each iteration during the optimization.  The word
> "profiled" means that, although the likelihood or the REML criterion
> are defined as functions of the fixed-effects parameters,
> $\boldmath{\beta}$, the common scale parameter, $\sigma$, and the
> parameters $\boldmath{\theta}$ that determine the relative covariance
> matrix $\boldmath{ \Sigma}$ of the random effects, these criteria are
> evaluated as a function of $\boldmath{\theta}$ alone, with
> $\boldmath{\beta}$ and $\sigma$ at their conditionally optimal values.
>
> The profiled REML criterion has a term that depends on the model
> matrix X for the fixed-effects parameters.  If you change the
> definition of the fixed-effects you will change the value of that
> criterion in a systematic way that does not depend on how well the
> respective models fit the observed data.  Thus, differences in the
> REML criterion are not a meaningful way to compare two models that
> differ in the definition of the fixed-effects.
>
> If you look at the code for the anova method for the mer class you
> will see that it always calls logLik with REML = FALSE.  That is, it
> extracts the log-likelihood from the profiled deviance even if the
> model has been fit according to the REML criterion.  In general that
> would be a very bad idea if we were using the deviance and not the
> profiled deviance.  You can't count on the deviance at the REML
> estimates for the complete parameter vector being close to the optimal
> deviance.  However, the profiled deviance at the REML estimates of
> $\boldmath{\theta}$ is close to the optimal profiled deviance.
>
> For example,
>
> fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
> ## profiled deviance at the REML estimate of $\theta$
> dput(d1 <- deviance(fm1, REML = FALSE))
> structure(1751.98581110077, .Names = "ML")
> ## optimal value of the deviance
> dput(d2 <- deviance(update(fm1, REML = FALSE)))
> structure(1751.93934480597, .Names = "ML")
> ## amount by which the profiled deviance at REML estimate
> ## exceeds the optimal value
> d1 - d2
>    ML
> 0.04646629
>
> As you can see, I have gravitated to using the term "REML criterion"
> rather than calling it a deviance or a log-likelihood.  The print
> method for mer objects still labels it as the REMLdeviance but I plan
> to change that and create an additional extractor called REML to
> return that value.  In that case the deviance extractor will always
> return the deviance, perhaps after re-optimizing to produce the
> optimal deviance.
>
> So the bottom line is that the anova method for mer objects always
> produces a likelihood ratio test based on the likelihood.  If you
> compare models fit by REML that likelihood ratio statistic will be
> somewhat inaccurate but only by a small amount (in all the cases that
> I have seen).
>