# [R-sig-ME] understanding log-likelihood/model fit

Douglas Bates bates at stat.wisc.edu
Thu Aug 21 17:41:44 CEST 2008

On Thu, Aug 21, 2008 at 7:35 AM, Daniel Ezra Johnson
<danielezrajohnson at gmail.com> wrote:
> Thank you all for your help.
>
> I'm now referring back to the discussion in Chapter 2 of Pinheiro and
> Bates and understanding this much better.
> Well, a little better.
>
> In the figures on pp. 73-74, the middle panels (log-residual norm)
> seem to illustrate what Douglas Bates has described here as
>
>> "the penalty depend[ing] on the (unconditional) variance
>> covariance matrix of the random effects.  When the variances are small
>> there is a large penalty.  When the variances are large there is a
>> small penalty on the size of the random effects."
>
> And the bottom panels (log-determinant ratio) seem to illustrate
>
>> The measure of model
>> complexity, which is related to the determinant of the conditional
>> variance of the random effects, given the data, [which] has the opposite
>> behavior.  When the variance of the random effects is small the model
>> is considered simpler.  The simplest possible model on this scale is
>> one without any random effects at all, corresponding to a variance of
>> zero.
>
> In these charts, as you move all the way to the right, in the limit,
> the values of Delta and theta are maximized, which I believe means the
> random effect variance goes to zero (with respect to the residual
> variance).
>
> As you move to the left, your model complexity gets worse, but your
> model fidelity improves for a time, and that's where you get the
> maximum log-likelihood (top panel).
>
> If theta going to infinity represents zero random effects, could you
> say that theta going to zero represents random effects that are no
> longer distinguishable from fixed effects?

Yes, in the sense that the residual sum of squares is what you would
obtain from a fixed-effects fit where the model matrix is the [X,Z]
where X is the model matrix for the fixed-effects parameters and Z is
the model matrix for the random effects.  The parameter estimates for
this model may not be well defined because the model could be
overparameterized.  However, the residual sum of squares for this
model is well-defined, as are the predictions and the residuals.

You could produce a similar plot for an lmer model fit.  Consider the
Dyestuff data in current versions of the lme4 package. The REML
criterion is similar to the maximum likelihood but it involves another
term so let's just consider the deviance and not the REML deviance.

> options(show.signif.stars = FALSE)
> (fm1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff, REML = FALSE, verbose = TRUE))
0:     327.33176: 0.730297
1:     327.33082: 0.772944
2:     327.32706: 0.753271
3:     327.32706: 0.752556
4:     327.32706: 0.752581
Linear mixed model fit by maximum likelihood
Formula: Yield ~ 1 + (1 | Batch)
Data: Dyestuff
AIC   BIC logLik deviance REMLdev
333.3 337.5 -163.7    327.3   319.7
Random effects:
Groups   Name        Variance Std.Dev.
Batch    (Intercept) 1388.1   37.258
Residual             2451.3   49.511
Number of obs: 30, groups: Batch, 6

Fixed effects:
Estimate Std. Error t value
(Intercept)  1527.50      17.69   86.33

The verbose output gives the profiled deviance at each iteration as a
function of a single parameter, which is the ratio of the standard
deviation of the random effects to the residual standard deviation.

> 37.258/49.511
[1] 0.7525196

This parameter is stored in the ST slot.  The slot is a list with
elements corresponding to the random effects terms in the model.  In
this model there is only one random effects term, (1|Batch).  In
general the elements of the list are square matrices where the number
of rows and columns corresponds to the number of random effects from
that term for each level of the grouping factor.  In this case there
is only one random effect per level of the factor.

> fm1 at ST
[[1]]
(Intercept)
(Intercept)   0.7525135

The deviance slot contains the deviance itself and several of its components.

> fm1 at deviance
ML         REML         ldL2        ldRX2      sigmaML    sigmaREML
327.327060   319.725920     8.059354     2.057972    49.510754    50.357153
73539.442307 62669.199627 10870.242680 62669.199627

For a linear mixed model, the discrepancy (disc) at the current
parameter values is equal to the weighted residual sum of squares
(wrss).  The random effects, b, are linearly transformed to u, for
which the unconditional distribution is a "spherical"
Gaussian distribution.  (i.e. the variance-covariance matrix is
\sigma^2 times the q by q identity matrix, where \sigma^2 is the same
scalar variance as in the definition of the conditional distribution
of Y|B).  The penalty on the size of the random effects is therefore
the squared length of u (usqr).  The penalized, weighted residual sum
of squares is the sum of wrss and usqr.  The conditional estimate of
\sigma^2 is pwrss/n and sigmaML is its square root.

> d <- fm1 at deviance
[1] TRUE
> fm1 at dims['n']
n
30
[1] TRUE

The log-determinant of the conditional variance-covariance of the
random effects, given the data (i.e. B|Y) is, up to the scale factor
of \sigma, ldL2.  The profiled deviance is calculated as shown in
slide 132 of http://www.stat.wisc.edu/~bates/PotsdamGLMM/LMMD.pdf

The actual code in the C function lmm_update_fixef_u in lme4/src/lmer.c is

d[ML_POS] = d[ldL2_POS] +
dn * (1 + log(d[pwrss_POS]) + log(2 * PI / dn));

or, in R,

> unname(d['ldL2'] + 30 * (1 + log(2 * pi * d['pwrss']/30)))
[1] 327.3271
> all.equal(unname(d['ML']), unname(d['ldL2'] + 30 * (1 + log(2 * pi * d['pwrss']/30))))
[1] TRUE

Next I was going to show how to change the value of the ST parameter
and update the deviance slot but in the process I discovered that one
of the C functions (the aforementioned lmm_update_fixef_u) cannot be
called from R.  After changing that and making a new release I can
show how to create a similar figure in the new formulation.