# [R-sig-ME] Calculation of random effects for factors in R

Douglas Bates bates at stat.wisc.edu
Sat Aug 1 18:27:01 CEST 2015

```On Sat, Aug 1, 2015 at 9:17 AM Paul Johnson <pauljohn32 at gmail.com> wrote:

> On Tue, Jul 28, 2015 at 9:01 AM, Sudell, Maria [mesudell]
> <M.E.Sudell at liverpool.ac.uk> wrote:
> > Hello,
> > I have a question concerning exactly how random effects for a factor are
> calculated in R.  I have tried to find an answer on various R related
> websites and text books but cannot find a definitive explanation.
> >
> > As an example, if you had a longitudinal dataset, and you wanted to
> include an individual specific random effect for a smoking factor (say 3
> levels, current, ex, never), how would the random effects be calculated
> using R?  (I understand how to code this in R, I am aiming to understand
> the mechanics of how the function gets to the random effects).
> >
> I'm putting together class notes on this, but they are not quite
> ready, or else I would give them to you.
>
> The Pinheiro & Bates book (2000) is the classic statement on this.
> There is a newer article that the lme4 team prepared for JSS will
> answer this for you.  Those are technically demanding.  I have found
> there are easier-to read interpretations of this in the Gelman & Hill
> 2007 book and in Ben Bolker's book Ecological Models and Data in R.
>
> The approach is penalized maximum likelihood, in which the random
> effects are conceptualized as coefficients on a random effects design
> matrix.   I did not realize how difficult this was to explain until I
> tried with some students. If you bang your head on a few of these
> books for a while, get the 2006 book by Simon Wood on generalized
> additive models. On the way to GAMs, he's got about the most beautiful
>  explanation of how these models are estimated that you will ever find.
> That's technically challenging, but I've never seen the structure laid out
> so beautifully.

I would not use the phrase "penalized maximum likelihood".  The evaluation
of the likelihood is done by solving a penalized least squares problem.  It
is this that makes it possible to fit models with a large number of
fixed-effects coefficients reasonably quickly because the fixed-effects
coefficients, and one of the variance parameters, can be "profiled out" of
the optimization problem.  This doesn't change the definition of the
maximum likelihood estimates, it simply makes them easier to evaluate.

In general I find that definitions of "<whatever> maximum likelihood"
estimators shouldn't be given those names.  The beauty of maximum
likelihood estimation is that the definition of the likelihood criterion is
so simple.  One you describe the probability model for the responses as a
function of the parameter values you use the same expression as the
probability density or probability mass function but regarded as a function
of the parameters with the responses fixed at the observed values.  There
are no optional flavors or "mix-ins" as at an ice-cream store.  Once you
have defined the probability model there is only one way to define the
likelihood.  The values of the parameters that maximize the likelihood are
the maximum likelihood estimates.

I do not try to hide the fact that I am a recovering mathematician (it's
been over thirty years since I have been tempted to prove a theorem but we
must remain vigilant) and I am very literal about the definitions of terms
in mathematics or mathematical statistics.  Given a probability model there
is only one likelihood function.  We can manipulate and re-express it to
make computation easier; for example, we typically optimize the
log-likelihood because it is more amenable to most optimization algorithms,
but we can do so because the values of the parameters that optimize the
log-likelihood are the same as those that optimize the likelihood.

My understanding so far would be that indicator variables for each of the
> levels of the factor would be included (in this case 3 indicator variables
> of 0,1, one for each of current, ex, never).  Then coefficients for the
> indicator variables would be found (so for each individual in the dataset,
> we would end up with a coefficient for one of the indicator variables,
> assuming that individuals can't be in more than one group).  These random
> coefficients (one for each individual as each individual would only fall
> into one smoking status) would then have their mean and variation
> calculated, in order to report the distribution of the random effect.  Is
> this correct?
> >
> Not exactly.  The estimate of the variance of the random effect is a
> parameter estimate, and so far as I can tell, it is not ever linked or
> even compared against the estimates of the individual case random
> effects. That's an interesting question, though.  Until you asked, I
> had not thought much about it.  I've never run ranef() to get the
> individual random effect estimates and calculated their variance.
>

On the surface it is entirely reasonable to expect the observed
variance-covariance of the BLUP's to be similar to the estimate of the
parameters. But when you look deeper you discover that those two quantities
are from different distributions and there is no good reason to expect that
they should be similar.

I prefer the term "conditional means", in the case of linear mixed models
(LMMs), or "conditional modes", for GLMMs or NLMMs, because, well, that's
what they are.  Other than having a cute acronym the phrase "best linear
unbiased predictor" does not, to me at least, describe what these things
are.  If I say I want the values that maximize the conditional density of
the random effects, given the observed data and with the parameters set to
their maximum likelihood estimates, I have defined these values in a
meaningful way and it is natural to call them the "conditional modes".  It
happens that in a Gaussian linear mixed model all the distributions being
considered are themselves multivariate Gaussian and hence the conditional
modes are also the conditional means.

The covariance parameters being estimated are from the variance-covariance
matrix of the unconditional distribution of the random effects. There is no
reason to expect them to correspond to the variance-covariance of the
conditional modes or means.

If you want to make an analogy to Bayesian inference, the parameters being
estimated define the prior distribution and the conditional means are from
the posterior distribution.

Sorry for going on at such length.  I hope this is, to some at least, more
illuminating than confusing.

Theoretically, we know the estimated random effects are a blend of the
> estimates you would get if you treated each subgroup in isolation and
> the estimate you get if you pool all of the data.  And the sample size
> within each group determines how much weight is placed on the
> subgroup-specific estimate.
>
> Since those estimates of the b's are shrunken in that way, their
> variance won't necessarily coincide with the variance number at the
> top of the lmer output.
>
> Anyway, I've been reading the papers by Doug Bates and now the larger
> lme4 team and they explain all this thoroughly.
>
> > Apologies for such a simple question.  Any help or explanation (or point
> to relevant paper or textbook) of how random effects are calculated for
> factors in R would be greatly appreciated.
> > Many thanks
> > Maria
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>
> --
> Paul E. Johnson
> Professor, Political Science        Director
> 1541 Lilac Lane, Room 504      Center for Research Methods
> University of Kansas                 University of Kansas
> http://pj.freefaculty.org              http://crmda.ku.edu
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

[[alternative HTML version deleted]]

```