# [R-sig-ME] lme4 vs. HLM (program) - differences in estimation?

Douglas Bates bates at stat.wisc.edu
Tue Nov 18 21:10:28 CET 2008

```On Tue, Nov 18, 2008 at 11:02 AM, Felix Schönbrodt <nicebread at gmx.net> wrote:
> Hi all,
>
> in my workgroup the HLM program by Raudenbush et al. is the de facto
> standard - however, I'd like to do my mixed models in R using the lme4
> package (as I do all other computations in R as well...)
>
> Both as a practice and as an argument for my colleagues I tried to replicate
> (amongst others) the omnipresent "math-achievement-in-catholic-vs.-public
> schools"-example (using the original HLM-data set) in lme4.
>
> My first question is: is my formula in lmer the right one?
>
> --> HLM-style model ( Y = MATHACH; ID = grouping factor)
>
> Level-1 Model
>
>        Y = B0 + B1*(SES) + R
>
> Level-2 Model
>        B0 = G00 + G01*(SECTOR) + G02*(MEANSES) + U0
>        B1 = G10 + G11*(SECTOR) + G12*(MEANSES) + U1
>
> Combined Model:
>
>        Y = G00 + G01*(SECTOR) + G02*(MEANSES) + G11*(SECTOR)*(SES) +
> G12*(MEANSES)*(SES) + U0 + U1*(SES) + R
>
> --> lmer-style:
>
>        HSB.ML <- lmer(mathach ~ sector + meanses + ses + meanses*ses +
> sector*ses + (ses.gmc | id), data=HSB)

That formula is not incorrect but it is a bit redundant.  In the R
formula notation the ':' operator creates an interaction and the '*'
operator is used to cross effects.  Thus meanses*ses expands to
meanses + ses + meanses:ses

> All models yield the same results concerning fixed effects; models including
> random slopes however show some minor divergence in the random effects
> variance (not very much, but it is there ...; see sample outputs below). In
> the presented example, the variance component is 0.10 (lme4) vs. 0.15 (HLM).
> I am aware that these differences are really small - in this case the
> difference was only 0.1% of the residual variance.

Does the HLM specification allow for correlation of the random effects
within ID group?  The lmer specification does.

> Nonetheless I would really be happy if someone could shed some light on this
> issue, so that I can go to my colleagues and say: "We get slightly different
> results _because_ [...], BUT this is not a problem, _because_ ..."
>
>
> From my web and literature searches I have two guesses about these
> differences in both programs:
>
> - a statement by Douglas Bates, that lme4 uses another estimation algorithm:
> "[...] but may be of interest to some who are familiar with a generalized
> least squares (GLS) representation of mixed-models (such as used in MLWin
> and HLM). The lme4 package uses a penalized least squares (PLS)
> (http://markmail.org/search/?q=lme4+PLS+GLS#query:lme4%20PLS%20GLS+page:1+mid:hb6p6mk6sztkvii2+state:results)

The criterion, either the log-likelihood for ML estimates or the REML
criterion for REML estimates, is defined as a property of the data and
the model.  The issue I was discussing there is how to evaluate the
log-likelihood or the REML criterion given the data, the model and
values of the parameters.  Solving a penalized least squares problem
or a generalized least squares problem is just a step in the
evaluation.  Both paths should give the same answer.  The reasons I
prefer the penalized least squares approach have to do with accuracy,
reliability and speed as well as generality of the approach.

I think it will be more important to check that the model
specifications are the same and that you are using the same criterion.
The default criterion for lmer is REML.  I don't know what the
default criterion for HLM is.

> - HLM maybe does some Bayesian Estimation, what lme4 does not do (?)

I'm not sure what that would mean.  To a statistician "Bayesian
Estimation" would be associated with Bayesian representations of
models in which the parameters are regarded as random variables.  The
estimation criterion would change from the likelihood (or a related
criterion like the REML criterion) to maximizing the "posterior
density" of the parameters.  Because the posterior density depends on
the likelihood and on the prior density you must specify both the
probability model and prior densities to be able to define the
estimates.

At least that is the statistician's view of Bayesian estimation.  In
some fields, like machine learning, the adjective Bayesian is applied
to any algorithm that seems remotely related to a probability model.
For example, if you check how movie ratings are calculated at imdb.com
they use what they term is a Bayesian algorithm which, in their case,
means that they use a weighted average of the actual votes for the
movie and a typical vote so that a movie that is highly rated by very
few people doesn't suddenly become their number 1 rated movie of all
time.  Just saying it is Bayesian doesn't define the answer.  You need
to specify the probability model.

I would check two things: does HLM estimate two variances and a
covariance for the random effects and is it using the REML criterion
or the ML criterion.

>
> But: these only are guesses from my side ... I'm not a statistician, but
> would like to understand it (a bit)
>
> All best,
> Felix
>
>
> #-----------------------------------
> # OUTPUT FROM HLM
> # ------------------------------------
>
>  Final estimation of fixed effects:
>  ----------------------------------------------------------------------------
>                                       Standard             Approx.
>    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
>  ----------------------------------------------------------------------------
>  For       INTRCPT1, B0
>    INTRCPT2, G00          12.096006   0.198734    60.865       157    0.000
>      SECTOR, G01           1.226384   0.306271     4.004       157    0.000
>     MEANSES, G02           5.333056   0.369161    14.446       157    0.000
>  For      SES slope, B1
>    INTRCPT2, G10           2.937980   0.157140    18.697       157    0.000
>      SECTOR, G11          -1.640951   0.242912    -6.755       157    0.000
>     MEANSES, G12           1.034418   0.302574     3.419       157    0.001
>  ----------------------------------------------------------------------------
>
>
>  Final estimation of variance components:
>  -----------------------------------------------------------------------------
>  Random Effect           Standard      Variance     df    Chi-square
>  P-value
>                         Deviation     Component
>  -----------------------------------------------------------------------------
>  INTRCPT1,       U0        1.54271       2.37996   157     605.29563
>  0.000
>      SES slope, U1        0.38603       0.14902   157     162.30883    0.369
>  level-1,       R         6.05831      36.70309
>  -----------------------------------------------------------------------------
>
>
>
> #-----------------------------------
> # OUTPUT FROM LMER
> #---------------------------------------
>
> Linear mixed model fit by REML
> Formula: mathach ~ meanses + sector + ses.gmc + meanses * ses.gmc + sector *
>      ses.gmc + (ses.gmc | id)
>   Data: HSB
>   AIC   BIC logLik deviance REMLdev
>  46524 46592 -23252    46496   46504
>
> Random effects:
>  Groups   Name        Variance Std.Dev. Corr
>  id       (Intercept)  2.379   1.543
>          ses.gmc      0.101   0.318    0.391
>  Residual             36.721   6.060
> Number of obs: 7185, groups: id, 160
>
> Fixed effects:
>                Estimate Std. Error t value
> (Intercept)     12.09600    0.19873  60.866
> meanses          5.33291    0.36916  14.446
> sector           1.22645    0.30627   4.005
> ses.gmc          2.93877    0.15509  18.949
> meanses:ses.gmc  1.03890    0.29889   3.476
> sector:ses.gmc  -1.64261    0.23978  -6.850
>
>
> ___________________________________
> Dipl. Psych. Felix Schönbrodt
> Department of Psychology
> Ludwig-Maximilian-Universität (LMU) Munich
> Leopoldstr. 13
> D-80802 München
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

```