[R-sig-ME] lme4 vs. HLM (program) - differences in estimation?

Tue Nov 18 21:23:59 CET 2008

On Tue, Nov 18, 2008 at 2:05 PM, Doran, Harold <HDoran at air.org> wrote:
> Felix:

> First off, you haven't group mean centered. With that said, I don't know why people do this anyhow, but I can see that you are not using a group mean centered variable.

> Group mean centering means the SES variable for person i in school j would be the same for all i. But, in your variable, ses.gmc varies by student just as it does in the original variable ses. Your variable ses.gm is the group mean variable

I'm not sure about that.  I believe that Felix is correct that the
"centered" part refers to the difference between the student's ses
score and some standard value and the "group mean centered" would
refer to using the observed mean ses for each group as the centering
value.  The variable ses.gmc would vary with person i in school j but
it would have the property that the values of ses.gmc summed across
all the students in school j would be zero.

Having said that, I would agree with you that there is more emphasis
on centering of variables in the HLM literature than I feel is
warranted.  Centering is related to parameterization and most of the
time changing parameters does not change the model.  One set of
parameters may be more convenient than another or provide a better
conditioned estimation situation but usually the model is not changed.

> In fact, what you would need for a group mean centered variable is:
>
> hsb$ses.gmc <- ave(hsb$ses, hsb$id)

> Which requires less work than your code, but gives the same result.

> Now, the variable meanses in the HSB data is supposed to be the group mean centered variable. But, when you compare to my values using ave() you can see the meanses variable in the HSB data are wrong.

I recall getting a result like that when I first looked at those data.
 I spent some time trying to reproduce the HLM results but eventually
gave up because of anomalies like this. The code for lmer is available
for anyone who wants to check exactly what is going on and the
evaluation of the log-likelihood and the REML criterion is described
in some detail - probably more detail than most people would want to
know - in one of the vignettes

> I suspect you are not using the same data between HLM and R and that may be the problem. That is, in R you create a variable called ses.gmc thinking this is a group mean centered variable. But, HLM "automagically" does group mean centering for you if you ask it to.
>
> When you work in HLM are you using the exact same data file that you created for your use in R? Or, are you relying on HLM to group mean center for you? If so, I suspect that is the issue. In fact, we can see that the difference in your estimates lies in this variable and this is the variable you manipulate in R, so I suspect the error may no be a function of estimation differences, but of data differences.
>
> With that said, I have found GLMM results between HLM and lmer to match exactly even though HLM uses a 6th order taylor series expansion and R uses a laplace approximation with a 2nd order taylor series. I have also found estimates from linear models to match exactly even though HLM uses a different method of estimation than lmer as well.
>
>> -----Original Message-----
>> From: Felix Schönbrodt [mailto:nicebread at gmx.net]
>> Sent: Tuesday, November 18, 2008 2:30 PM
>> To: Doran, Harold
>> Cc: r-sig-mixed-models at r-project.org
>> Subject: Re: [R-sig-ME] lme4 vs. HLM (program) - differences
>> in estimation?
>>
>> Thanks for your fast reply,
>>
>> your right, I missed that in my explanation: Raudenbush et
>> al. enter the socioeconomic status (ses) group centered into
>> their model.
>> Therefore I calculated that group centered variable and
>> called it ses.gmc (gmc = group mean centered; [I just
>> recognize that this notion is ambiguous to "grand mean centered...])
>>
>> # group-centering of ses (as R&B do ...) # ses.gm = group
>> mean # ses.gmc = ses [group mean centered]
>>
>> gMeans <- aggregate(HSB$ses, list(HSB$id), mean)
>> names(gMeans) <- c("id", "ses.gm")
>> HSB <- merge(HSB, gMeans, by="id")
>> HSB$ses.gmc <- HSB$ses - HSB$ses.gm
>>
>> HSB.ML <- lmer(mathach ~ meanses + sector + ses.gmc +
>> meanses*ses.gmc
>> + sector*ses.gmc + (ses.gmc|id), data=HSB)
>>
>> I think despite that, we have the same dataset.
>> I used lme4_0.999375-27.
>>
>> Felix
>>
>>
>>
>>
>> Am 18.11.2008 um 19:54 schrieb Doran, Harold:
>>
>> > Felix
>> >
>> > I think I can help a bit. I need to know what ses.gmc is. I
>> have those
>> > data and have run the following model:
>> >
>> > lmer(mathach ~ meanses * ses + sector * ses + (ses | id), hsb)
>> >
>> > My output below shows the data (your and mine) have the
>> same number of
>> > obs. But, the original data distributed with HLM does not have a
>> > variable called ses.gmc. The only variables in the data are:
>> >
>> >> str(hsb)
>> > 'data.frame':   7185 obs. of  11 variables:
>> > $ id      : Factor w/ 160 levels "1224","1288",..: 1 1 1 1
>> 1 1 1 1 1
>> > 1 ...
>> > $ minority: num  0 0 0 0 0 0 0 0 0 0 ...
>> > $ female  : num  1 1 0 0 0 0 1 0 1 0 ...
>> > $ ses     : num  -1.528 -0.588 -0.528 -0.668 -0.158 ...
>> > $ mathach : num   5.88 19.71 20.35  8.78 17.90 ...
>> > $ size    : num  842 842 842 842 842 842 842 842 842 842 ...
>> > $ sector  : num  0 0 0 0 0 0 0 0 0 0 ...
>> > $ pracad  : num  0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35
>> 0.35 0.35 ...
>> > $ disclim : num  1.60 1.60 1.60 1.60 1.60 ...
>> > $ himinty : num  0 0 0 0 0 0 0 0 0 0 ...
>> > $ meanses : num  -0.428 -0.428 -0.428 -0.428 -0.428 -0.428 -0.428
>> > -0.428 -0.428 -0.428 ...
>> >
>> > Also you should report what version of lme4 you are using. Do a
>> > sessionInfo() and this will appear.
>> >
>> >> lmer(mathach ~ meanses * ses + sector * ses + (ses | id), hsb)
>> > Linear mixed model fit by REML
>> > Formula: mathach ~ meanses * ses + sector * ses + (ses | id)
>> >   Data: hsb
>> >   AIC   BIC logLik deviance REMLdev
>> > 46525 46594 -23253    46498   46505
>> > Random effects:
>> > Groups   Name        Variance Std.Dev. Corr
>> > id       (Intercept)  2.40744 1.55159
>> >          ses          0.01462 0.12091  1.000
>> > Residual             36.75838 6.06287
>> > Number of obs: 7185, groups: id, 160
>> >
>> > Fixed effects:
>> >            Estimate Std. Error t value
>> > (Intercept)  12.0948     0.2028   59.65
>> > meanses       3.3202     0.3885    8.55
>> > ses           2.9052     0.1483   19.59
>> > sector        1.1949     0.3079    3.88
>> > meanses:ses   0.8462     0.2718    3.11
>> > ses:sector   -1.5781     0.2245   -7.03
>> >
>> > Correlation of Fixed Effects:
>> >            (Intr) meanss ses    sector mnss:s
>> > meanses      0.213
>> > ses          0.076 -0.146
>> > sector      -0.676 -0.344 -0.064
>> > meanses:ses -0.143  0.178  0.278 -0.081 ses:sector  -0.062 -0.080
>> > -0.679  0.093 -0.356
>> >
>> >> -----Original Message-----
>> >> From: r-sig-mixed-models-bounces at r-project.org
>> >> [mailto:r-sig-mixed-models-bounces at r-project.org] On
>> Behalf Of Felix
>> >> Schönbrodt
>> >> Sent: Tuesday, November 18, 2008 12:02 PM
>> >> To: r-sig-mixed-models at r-project.org
>> >> Subject: [R-sig-ME] lme4 vs. HLM (program) - differences in
>> >> estimation?
>> >>
>> >> Hi all,
>> >>
>> >> in my workgroup the HLM program by Raudenbush et al. is
>> the de facto
>> >> standard - however, I'd like to do my mixed models in R using the
>> >> lme4 package (as I do all other computations in R as well...)
>> >>
>> >> Both as a practice and as an argument for my colleagues I tried to
>> >> replicate (amongst others) the omnipresent
>> >> "math-achievement-in- catholic-vs.-public schools"-example
>> (using the
>> >> original HLM-data set) in lme4.
>> >>
>> >> My first question is: is my formula in lmer the right one?
>> >>
>> >> --> HLM-style model ( Y = MATHACH; ID = grouping factor)
>> >>
>> >> Level-1 Model
>> >>
>> >>    Y = B0 + B1*(SES) + R
>> >>
>> >> Level-2 Model
>> >>    B0 = G00 + G01*(SECTOR) + G02*(MEANSES) + U0
>> >>    B1 = G10 + G11*(SECTOR) + G12*(MEANSES) + U1
>> >>
>> >> Combined Model:
>> >>
>> >>    Y = G00 + G01*(SECTOR) + G02*(MEANSES) + G11*(SECTOR)*(SES) +
>> >> G12*(MEANSES)*(SES) + U0 + U1*(SES) + R
>> >>
>> >> --> lmer-style:
>> >>
>> >>    HSB.ML <- lmer(mathach ~ sector + meanses + ses + meanses*ses +
>> >> sector*ses + (ses.gmc | id), data=HSB)
>> >>
>> >>
>> >>
>> >> All models yield the same results concerning fixed effects; models
>> >> including random slopes however show some minor divergence in the
>> >> random effects variance (not very much, but it is there ...; see
>> >> sample outputs below). In the presented example, the variance
>> >> component is 0.10 (lme4) vs. 0.15 (HLM).
>> >> I am aware that these differences are really small - in
>> this case the
>> >> difference was only 0.1% of the residual variance.
>> >>
>> >> Nonetheless I would really be happy if someone could shed
>> some light
>> >> on this issue, so that I can go to my colleagues and say: "We get
>> >> slightly different results _because_ [...], BUT this is not a
>> >> problem, _because_ ..."
>> >>
>> >>
>> >> From my web and literature searches I have two guesses about these
>> >> differences in both programs:
>> >>
>> >> - a statement by Douglas Bates, that lme4 uses another estimation
>> >> algorithm:
>> >> "[...] but may be of interest to some who are familiar with a
>> >> generalized least squares (GLS) representation of
>> mixed-models (such
>> >> as used in MLWin and HLM). The lme4 package uses a penalized least
>> >> squares (PLS) representation instead."
>> >> (http://markmail.org/search/?q=lme4+PLS+GLS#query:lme4%20PLS%20GLS
>> >> +page:1+mid:hb6p6mk6sztkvii2+state:results)
>> >>
>> >> - HLM maybe does some Bayesian Estimation, what lme4 does
>> not do (?)
>> >>
>> >> But: these only are guesses from my side ... I'm not a
>> statistician,
>> >> but would like to understand it (a bit)
>> >>
>> >> All best,
>> >> Felix
>> >>
>> >>
>> >> #-----------------------------------
>> >> # OUTPUT FROM HLM
>> >> # ------------------------------------
>> >>
>> >>  Final estimation of fixed effects:
>> >>
>> >> --------------------------------------------------------------
>> >> --------------
>> >>                                        Standard             Approx.
>> >>     Fixed Effect         Coefficient   Error      T-ratio
>> >> d.f.
>> >> P-value
>> >>
>> >> --------------------------------------------------------------
>> >> --------------
>> >>  For       INTRCPT1, B0
>> >>     INTRCPT2, G00          12.096006   0.198734    60.865
>> >> 157    0.000
>> >>       SECTOR, G01           1.226384   0.306271     4.004
>> >> 157    0.000
>> >>      MEANSES, G02           5.333056   0.369161    14.446
>> >> 157    0.000
>> >>  For      SES slope, B1
>> >>     INTRCPT2, G10           2.937980   0.157140    18.697
>> >> 157    0.000
>> >>       SECTOR, G11          -1.640951   0.242912    -6.755
>> >> 157    0.000
>> >>      MEANSES, G12           1.034418   0.302574     3.419
>> >> 157    0.001
>> >>
>> >> --------------------------------------------------------------
>> >> --------------
>> >>
>> >>
>> >>  Final estimation of variance components:
>> >>
>> >> --------------------------------------------------------------
>> >> ---------------
>> >>  Random Effect           Standard      Variance     df
>> >> Chi-square
>> >> P-value
>> >>                          Deviation     Component
>> >>
>> >> --------------------------------------------------------------
>> >> ---------------
>> >>  INTRCPT1,       U0        1.54271       2.37996   157
>> >> 605.29563    0.000
>> >>       SES slope, U1        0.38603       0.14902   157
>> >> 162.30883    0.369
>> >>   level-1,       R         6.05831      36.70309
>> >>
>> >> --------------------------------------------------------------
>> >> ---------------
>> >>
>> >>
>> >>
>> >> #-----------------------------------
>> >> # OUTPUT FROM LMER
>> >> #---------------------------------------
>> >>
>> >> Linear mixed model fit by REML
>> >> Formula: mathach ~ meanses + sector + ses.gmc + meanses * ses.gmc +
>> >> sector *      ses.gmc + (ses.gmc | id)
>> >>    Data: HSB
>> >>    AIC   BIC logLik deviance REMLdev
>> >>  46524 46592 -23252    46496   46504
>> >>
>> >> Random effects:
>> >>  Groups   Name        Variance Std.Dev. Corr
>> >>  id       (Intercept)  2.379   1.543
>> >>           ses.gmc      0.101   0.318    0.391
>> >>  Residual             36.721   6.060
>> >> Number of obs: 7185, groups: id, 160
>> >>
>> >> Fixed effects:
>> >>                 Estimate Std. Error t value
>> >> (Intercept)     12.09600    0.19873  60.866
>> >> meanses          5.33291    0.36916  14.446
>> >> sector           1.22645    0.30627   4.005
>> >> ses.gmc          2.93877    0.15509  18.949
>> >> meanses:ses.gmc  1.03890    0.29889   3.476
>> >> sector:ses.gmc  -1.64261    0.23978  -6.850
>> >>
>> >>
>> >> ___________________________________
>> >> Dipl. Psych. Felix Schönbrodt
>> >> Department of Psychology
>> >> Ludwig-Maximilian-Universität (LMU) Munich Leopoldstr. 13
>> >> D-80802 München
>> >> _______________________________________________
>> >> R-sig-mixed-models at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>
>>
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>