[R-sig-ME] GLMM estimation readings?

Tue Apr 5 20:58:56 CEST 2016

Hi Paul,

I recently added fitting of GLMMs to the MixedModels package for Julia (
https://github.com/dmbates/MixedModels.jl).  It is by no means finished but
it does fit a couple of examples.

I mention this because the Julia code is, I feel, easier to read than is
the C++ code that implements GLMM fitting in the lme4 package, although
being easier to read than that C++ code is a low bar.

If you click through to src/PIRLS.jl you will see that the Laplace
approximation to the deviance is defined as
- the sum of the (squared) deviance residuals at the current values of beta
and u
- plus the squared length of u, where b = Lambda * u
- plus the logarithm of the determinant of Lambda'Z'Z * Lambda + I

The model provides the conditional distribution of Y given B = b (or,
equivalently, U = u) and the unconditional distribution of B (or U).  From
those we get the joint density of Y and B (or Y and U). From the joint
density we obtain the conditional density of U given Y = y up to a scaling
factor.  I call this the unscaled conditional density.  The likelihood for
beta and theta (the covariance parameter vector that determines Lambda) is
the integral of this unscaled conditional density.

Because the unconditional density of U is multivariate normal, the unscaled
conditional density ends up being pretty close to a scaled multivariate
normal.  We approximate the unscaled conditional density by a scaled
multivariate normal that matches the value at the peak (i.e. the
conditional modes of the random effects) and the Hessian at that point.
This is the Laplace approximation. Adaptive Gauss-Hermite quadrature
refines this integral by evaluating the unscaled conditional density at
other points near the peak.

The conditional modes are determined via penalized iteratively reweighted
least squares (PIRLS).  See the function pirls! in the PIRLS.jl file. (The
function name ending in ! is an indication that it is a mutating function.
That is, it modifies the values of one or more of its arguments.)

Linear mixed models (LMMs) are simpler to fit because there is a
closed-form expression for the conditional modes of the random effects and
the conditional estimates of the fixed effects given theta.  Also, the
conditional distribution of U given Y = y is a multivariate normal so the
Laplace "approximation" is actually the exact value of the log-likelihood.
See section 3 of  https://www.jstatsoft.org/article/view/v067i01

I'm not sure if this explanation is illuminating or confusing.  To recap,
evaluating the deviance of a GLMM at given values of beta and theta requires
- evaluating Lambda from theta
- setting X*beta at the offset in the model
- determine the conditional modes of the random effects, U, via PIRLS
- evaluate an approximation to the integral of the conditional density

I agree that the penalty is an important justification for random effects
versus fixed effects.  John Tukey put a positive spin on the situation by
saying that we are "borrowing strength" from the other individuals in the
sample when we use random effects.

Consider fitting an item response model to a dichotomous response (say
correct/incorrect) where each response is associated with a subject and an
item (e.g. a question on an exam).  Often such models are fit using
fixed-effects for the subject and random effects for the item, in which
case there is no finite estimate of the fixed-effect for a subject who gets
all the items correct.  It we model both subjects and items with random
effects, the penalty or shrinkage brings the extremes of the estimates for
the subjects in closer to the population mean.

I hope this helps.

On Mon, Apr 4, 2016 at 6:22 AM Paul Johnson <pauljohn32 at gmail.com> wrote:

> I'm trying to explain GLMM estimation and defend random effects in an
> audience of econometricians.  I am focused on logit models, mostly, with
> random intercepts.
>
> The literature is difficult. I can understand applications and overviews
> like the paper by Bolker et al in Trends in Ecology and Evolution. But I
> can't understand much about glmer that is deeper than that.  Can you point
> me at some books/articles that explain the GLMM estimation process in an
> understandable way? Is there a PLS derivation for GLMM?  I want to better
> understand adaptive quadrature. And Laplace approximation.
>
> You might be able to advise me better if I tell you why I need to know.
>
> I was surprised to learn that economists hate random effects. It is almost
> visceral. For the economists, the fixed vs random effects debate is not
> philosophical, but rather practical. In an LMM, the Hausman test seems to
> bluntly reject almost all random effects models.  (See William Greene's
> Econometrics book).  Unmeasured group-level predictors always exist, it
> seems, so random effects estimates are biased/inconsistent. Even if you
> believe Intercept differences are random, LMM estimates are
> biased/inconsistent, so you should treat as fixed.
>
> I'm a little surprised there is so little discussion of Hausman's test in
> the random effect literature outside economics.
>
> One argument used by random effect advocates, that group-level predictors
> can be included in LMM, holds no weight at all. It is just evidence of bias
> in LMM. Well, the estimates thus obtained are useless because, if the group
> level intercept estimates were correct, then the group-level predictors
> would not be identifiable.
>
> My argument with them so far is based on the characterization of LMM as a
> PLS exercise, which I learned in this email list. That makes a point
> obvious: the fixed vs random models differ because PLS penalizes the b's,
> but fixed estimators do not. The issue is not "random" against "fixed". It
> is penalized against unpenalized. The parallel between LMM and ridge
> regression and LASSO helps.  If the number of observations within groups
> grows, then the posterior modes  and the fixed effect estimates converge.
> Yes?
>
> The small sample debate hinges on mean square error of the b's. The PLS
> view makes it plain that Empirical Bayes gives shrinkage not as
> afterthought (as it seems in the GLS narrative), but as a primary element
> (Henderson's estimator).   Ironically, the shrinkage effect of LMM, widely
> praised in stats and hierarchical modeling applications, raises suspicion
> of bias. One might prefer a biased, but lower variance estimate of b, and
> that's shrinkage.  That's my theme, anyway, we'll see if I can sell it.
>
> In that context, I come to the chore of comparing a glmer estimate with a
> fixed effect method known as conditional logit or Chamberlain's panel logit
> model.
>
> pj
> Paul Johnson
> http://pj.freefaculty.org
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]