[R-sig-ME] understanding log-likelihood/model fit
Douglas Bates
bates at stat.wisc.edu
Thu Aug 21 17:41:44 CEST 2008
On Thu, Aug 21, 2008 at 7:35 AM, Daniel Ezra Johnson
<danielezrajohnson at gmail.com> wrote:
> Thank you all for your help.
>
> I'm now referring back to the discussion in Chapter 2 of Pinheiro and
> Bates and understanding this much better.
> Well, a little better.
>
> In the figures on pp. 73-74, the middle panels (log-residual norm)
> seem to illustrate what Douglas Bates has described here as
>
>> "the penalty depend[ing] on the (unconditional) variance
>> covariance matrix of the random effects. When the variances are small
>> there is a large penalty. When the variances are large there is a
>> small penalty on the size of the random effects."
>
> And the bottom panels (log-determinant ratio) seem to illustrate
>
>> The measure of model
>> complexity, which is related to the determinant of the conditional
>> variance of the random effects, given the data, [which] has the opposite
>> behavior. When the variance of the random effects is small the model
>> is considered simpler. The simplest possible model on this scale is
>> one without any random effects at all, corresponding to a variance of
>> zero.
>
> In these charts, as you move all the way to the right, in the limit,
> the values of Delta and theta are maximized, which I believe means the
> random effect variance goes to zero (with respect to the residual
> variance).
>
> As you move to the left, your model complexity gets worse, but your
> model fidelity improves for a time, and that's where you get the
> maximum log-likelihood (top panel).
>
> If theta going to infinity represents zero random effects, could you
> say that theta going to zero represents random effects that are no
> longer distinguishable from fixed effects?
Yes, in the sense that the residual sum of squares is what you would
obtain from a fixed-effects fit where the model matrix is the [X,Z]
where X is the model matrix for the fixed-effects parameters and Z is
the model matrix for the random effects. The parameter estimates for
this model may not be well defined because the model could be
overparameterized. However, the residual sum of squares for this
model is well-defined, as are the predictions and the residuals.
You could produce a similar plot for an lmer model fit. Consider the
Dyestuff data in current versions of the lme4 package. The REML
criterion is similar to the maximum likelihood but it involves another
term so let's just consider the deviance and not the REML deviance.
> options(show.signif.stars = FALSE)
> (fm1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff, REML = FALSE, verbose = TRUE))
0: 327.33176: 0.730297
1: 327.33082: 0.772944
2: 327.32706: 0.753271
3: 327.32706: 0.752556
4: 327.32706: 0.752581
Linear mixed model fit by maximum likelihood
Formula: Yield ~ 1 + (1 | Batch)
Data: Dyestuff
AIC BIC logLik deviance REMLdev
333.3 337.5 -163.7 327.3 319.7
Random effects:
Groups Name Variance Std.Dev.
Batch (Intercept) 1388.1 37.258
Residual 2451.3 49.511
Number of obs: 30, groups: Batch, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 1527.50 17.69 86.33
The verbose output gives the profiled deviance at each iteration as a
function of a single parameter, which is the ratio of the standard
deviation of the random effects to the residual standard deviation.
> 37.258/49.511
[1] 0.7525196
This parameter is stored in the ST slot. The slot is a list with
elements corresponding to the random effects terms in the model. In
this model there is only one random effects term, (1|Batch). In
general the elements of the list are square matrices where the number
of rows and columns corresponds to the number of random effects from
that term for each level of the grouping factor. In this case there
is only one random effect per level of the factor.
> fm1 at ST
[[1]]
(Intercept)
(Intercept) 0.7525135
The deviance slot contains the deviance itself and several of its components.
> fm1 at deviance
ML REML ldL2 ldRX2 sigmaML sigmaREML
327.327060 319.725920 8.059354 2.057972 49.510754 50.357153
pwrss disc usqr wrss
73539.442307 62669.199627 10870.242680 62669.199627
For a linear mixed model, the discrepancy (disc) at the current
parameter values is equal to the weighted residual sum of squares
(wrss). The random effects, b, are linearly transformed to u, for
which the unconditional distribution is a "spherical"
Gaussian distribution. (i.e. the variance-covariance matrix is
\sigma^2 times the q by q identity matrix, where \sigma^2 is the same
scalar variance as in the definition of the conditional distribution
of Y|B). The penalty on the size of the random effects is therefore
the squared length of u (usqr). The penalized, weighted residual sum
of squares is the sum of wrss and usqr. The conditional estimate of
\sigma^2 is pwrss/n and sigmaML is its square root.
> d <- fm1 at deviance
> all.equal(unname(d['pwrss']), unname(d['wrss'] + d['usqr']))
[1] TRUE
> fm1 at dims['n']
n
30
> all.equal(unname(d['sigmaML']), unname(sqrt(d['pwrss']/30)))
[1] TRUE
The log-determinant of the conditional variance-covariance of the
random effects, given the data (i.e. B|Y) is, up to the scale factor
of \sigma, ldL2. The profiled deviance is calculated as shown in
slide 132 of http://www.stat.wisc.edu/~bates/PotsdamGLMM/LMMD.pdf
The actual code in the C function lmm_update_fixef_u in lme4/src/lmer.c is
d[ML_POS] = d[ldL2_POS] +
dn * (1 + log(d[pwrss_POS]) + log(2 * PI / dn));
or, in R,
> unname(d['ldL2'] + 30 * (1 + log(2 * pi * d['pwrss']/30)))
[1] 327.3271
> all.equal(unname(d['ML']), unname(d['ldL2'] + 30 * (1 + log(2 * pi * d['pwrss']/30))))
[1] TRUE
Next I was going to show how to change the value of the ST parameter
and update the deviance slot but in the process I discovered that one
of the C functions (the aforementioned lmm_update_fixef_u) cannot be
called from R. After changing that and making a new release I can
show how to create a similar figure in the new formulation.
More information about the R-sig-mixed-models
mailing list