[R-sig-ME] Changing the displayed values of logLik, AIC and BIC for linear mixed models fit by REML

Tue May 19 22:25:52 CEST 2009

In writing about the statistics displayed for a linear mixed model I
have again considered an inconsistency in the definition of the
log-likelihood, the AIC and BIC criterion.  In the past some users
have noticed that the AIC, BIC and log-likelihood criteria are
displayed differently in the summary of a model fit by REML and in the
anova comparison of two such models.  Consider

> fm1
Linear mixed model fit by REML
Formula: Reaction ~ Days + (Days | Subject)
   Data: sleepstudy
  AIC  BIC logLik deviance REMLdev
 1756 1775 -871.8     1752    1744
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 618.756  24.8748
          Days         35.572   5.9642  0.022
 Residual             654.075  25.5749
Number of obs: 180, groups: Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept)  251.366      6.867   36.60
Days          10.467      1.555    6.73

Correlation of Fixed Effects:
     (Intr)
Days -0.170
> fm2
Linear mixed model fit by REML
Formula: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
   Data: sleepstudy
  AIC  BIC logLik deviance REMLdev
 1754 1770 -871.8     1752    1744
Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 627.520  25.0503
 Subject  Days         35.859   5.9883
 Residual             653.587  25.5653
Number of obs: 180, groups: Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept)  251.405      6.885   36.51
Days          10.467      1.560    6.71

Correlation of Fixed Effects:
     (Intr)
Days -0.184
> anova(fm2,fm1)
Data: sleepstudy
Models:
fm2: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
fm1: Reaction ~ Days + (Days | Subject)
    Df     AIC     BIC  logLik  Chisq Chi Df Pr(>Chisq)
fm2  5 1762.05 1778.01 -876.02
fm1  6 1764.04 1783.19 -876.02 0.0104      1     0.9188

Note that the logLik for fm2 is -871.8 in the model summary and
-876.02 in the anova output.

Now the short answer to why this happens is that differences in the
REML criterion do not always behave like differences in the deviance
so, to be on the safe side, the criteria in the anova output are
derived from the deviance, not the REML criterion.  (The name REMLdev
means the REML criterion on the deviance scale.)  In the summary the
quantity labeled "logLik" is negative half the REMLdev whereas in the
anova output it is negative half the deviance.  The AIC and BIC
criteria are derived from the value of the logLik in both cases.

I am considering changing the display of the summary so that the
logLik, AIC and BIC are always the values corresponding to the
deviance.  The disadvantage of doing this is that the display will be
inconsistent with earlier fits of the same model.  Also, the deviance
quoted for a REML fit is not the minimum possible deviance, which is
the deviance from the ML fit.  However, it is usually quite close to
the minimum.  This may seem peculiar because the parameter estimates
in the REML and ML fits can be quite different so why should the
deviances be similar? The trick is that the deviance and the REMLdev
values are the profiled deviance and the profiled REML criteria and
depend only on a reduced parameter vector.  Those parameters don't
change much between the REML and the ML fits.

The advantage of making the change is that the model summary and the
anova results would be consistent.  Also, I think that the quantity
labeled "logLik" should be a log-likelihood, not just something that
kind-of behaves like a log-likelihood.

I'm trying to think of what would be the downside of making the
change.  Obviously it would be upsetting to get apparently different
results from those obtained using previous versions of lme4 but I
think that the explanation of the inconsistency could be propagated
sufficiently widely for users to learn of it.  Would the results be
inconsistent with those reported by SAS PROC MIXED, SPSS, Stata, HLM,
MLWin, etc.?  Are there good reasons for wanting to evaluate negative
half the REMLdev value?  I have seen discussion of REML-based tests
and I remember that Ahn and Reinsel described advantages for them but
I have forgotten the details.  I believe I read a message on this list
about an R package for REML-based tests but I can't seem to find it
now.  Can anyone refresh my failing memory?