# [R-sig-ME] lmer: ML and REML estimation

Douglas Bates bates at stat.wisc.edu
Sat Mar 28 15:45:31 CET 2009

On Fri, Mar 27, 2009 at 1:01 PM, dave fournier <otter at otter-rsch.com> wrote:
>  Maybe I'm missing something here. The way I see it,
> in its most general form a mixed model is simply one
> where some of the parameters are random variables with a prior.

That's a rather general class of models.  Some might use the name
"Empirical Bayes".

I read the phrase "some of the parameters are random variables" to be
referring to the random effects.  I phrase things slightly
differently.  In particular I don't regard the random effects as
parameters. I regard a mixed-effects model as being based on two
random variables: the response Y whose value, y, has been observed and
an unobserved random effects vector B.  The probability model
describes the conditional distribution of Y, given B = b, and the
unconditional distribution of B.  For the models that can be fit by
the lme4 package the unconditional distribution of B is a multivariate
Gaussian with mean zero and a parameterized variance-covariance
matrix.  The conditional mean of Y, given B = b, depends on b through
a linear predictor expression, X\beta + Zb, that also incorporates the
fixed-effects parameters \beta.

The random effects B are similar to the fixed-effects \beta in that
they both occur as coefficients in the linear predictor but I find
that, for me at least, it is important to retain a distinction between
the random effects and the model parameters.

> So say the likelihood looks like
>
>   f(x,u)p(u)
>
> where the vector of parameters u have a prior p(u).
> If the u are normally distributed with a vector of
> std devs sigma this becomes
>
>    f(x,u)p(u,sigma)
>
> Integrating over u gives
>
>  F(x,sigma) = \Int F(x,u)p(u,sigma) du
>
> so the MLE's xhat,sigamhat are found by maximizing
> F wrt x and sigma.

Well, that was kind of the point of the discussion.  For a linear
mixed model there are many equivalent ways to arrive at the REML
criterion.  One way that provides a convenient computational approach
is to maximize, with respect to the random-effects dispersion
parameters, the integral of the likelihood over the fixed-effects
parameters.  The original motivation for the criterion is as the
maximum likelihood estimator of the variance parameters based on a set
of contrasts that are orthogonal to the fixed-effects model space.
That is, these are the maximum likelihood estimators from a set of
linear combinations of data values that are to a certain subspace, the
subspace of all possible residuals.  This is why they are called the
restricted or residual maximum likelihood estimators.

I first saw the relationship between maximizing the integrated
likelihood and the REML estimators in the 1982 Biometrics paper
"Random Effects Models for Longitudinal Data" by Laird and Ware but it
may have been known before that.

It is certainly possible to define estimators that maximize the
integrated likelihood in more general mixed-effects model
specifications but should we call these REML estimators and do we know
that they will share the properties that make REML a popular
estimation criterion for linear mixed models?

I oversimplified when I said that REML is not defined for GLMMs.  I
was using REML to refer to the estimators based on orthogonal
subspaces in the sample space.

> Now if we are willing to take a Bayesian point of view

I thought you were already there when you wrote about "parameters are
random variables with a prior" :-)

> we could also integrate over x and get the marginal
> likelihood for the sigma alone. We may need a prior on x,
> or assume a locally uniform prior. In any event we obtain
>
>  G(sigma) = \Int F(x,u)p(u,sigma) du dx
>
> The integral can be approximated using the Laplace approximation
> at the mode of  F(x,u)p(u,sigma).  Now I'm not claiming that this
> approach always works well, but it is a simple recipe for producing
> an estimation procedure which may give better estimates for sigma.
> Using AD Model Builders Random effects module which is freely available
> at http://admb-project.org it is simple to
> change the model specification to carry out this procedure. As I said
> before it seemed to work well in a nonlinear fisheries model I looked
> at.
>
> --
> David A. Fournier
> P.O. Box 2040,
> Sidney, B.C. V8l 3S3