[R-sig-ME] lme4 vs. HLM (program) - differences in estimation?
Doran, Harold
HDoran at air.org
Tue Nov 18 21:05:09 CET 2008
Felix:
First off, you haven't group mean centered. With that said, I don't know why people do this anyhow, but I can see that you are not using a group mean centered variable.
Group mean centering means the SES variable for person i in school j would be the same for all i. But, in your variable, ses.gmc varies by student just as it does in the original variable ses. Your variable ses.gm is the group mean variable
In fact, what you would need for a group mean centered variable is:
hsb$ses.gmc <- ave(hsb$ses, hsb$id)
Which requires less work than your code, but gives the same result.
Now, the variable meanses in the HSB data is supposed to be the group mean centered variable. But, when you compare to my values using ave() you can see the meanses variable in the HSB data are wrong.
I suspect you are not using the same data between HLM and R and that may be the problem. That is, in R you create a variable called ses.gmc thinking this is a group mean centered variable. But, HLM "automagically" does group mean centering for you if you ask it to.
When you work in HLM are you using the exact same data file that you created for your use in R? Or, are you relying on HLM to group mean center for you? If so, I suspect that is the issue. In fact, we can see that the difference in your estimates lies in this variable and this is the variable you manipulate in R, so I suspect the error may no be a function of estimation differences, but of data differences.
With that said, I have found GLMM results between HLM and lmer to match exactly even though HLM uses a 6th order taylor series expansion and R uses a laplace approximation with a 2nd order taylor series. I have also found estimates from linear models to match exactly even though HLM uses a different method of estimation than lmer as well.
> -----Original Message-----
> From: Felix Schönbrodt [mailto:nicebread at gmx.net]
> Sent: Tuesday, November 18, 2008 2:30 PM
> To: Doran, Harold
> Cc: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] lme4 vs. HLM (program) - differences
> in estimation?
>
> Thanks for your fast reply,
>
> your right, I missed that in my explanation: Raudenbush et
> al. enter the socioeconomic status (ses) group centered into
> their model.
> Therefore I calculated that group centered variable and
> called it ses.gmc (gmc = group mean centered; [I just
> recognize that this notion is ambiguous to "grand mean centered...])
>
> # group-centering of ses (as R&B do ...) # ses.gm = group
> mean # ses.gmc = ses [group mean centered]
>
> gMeans <- aggregate(HSB$ses, list(HSB$id), mean)
> names(gMeans) <- c("id", "ses.gm")
> HSB <- merge(HSB, gMeans, by="id")
> HSB$ses.gmc <- HSB$ses - HSB$ses.gm
>
> HSB.ML <- lmer(mathach ~ meanses + sector + ses.gmc +
> meanses*ses.gmc
> + sector*ses.gmc + (ses.gmc|id), data=HSB)
>
> I think despite that, we have the same dataset.
> I used lme4_0.999375-27.
>
> Felix
>
>
>
>
> Am 18.11.2008 um 19:54 schrieb Doran, Harold:
>
> > Felix
> >
> > I think I can help a bit. I need to know what ses.gmc is. I
> have those
> > data and have run the following model:
> >
> > lmer(mathach ~ meanses * ses + sector * ses + (ses | id), hsb)
> >
> > My output below shows the data (your and mine) have the
> same number of
> > obs. But, the original data distributed with HLM does not have a
> > variable called ses.gmc. The only variables in the data are:
> >
> >> str(hsb)
> > 'data.frame': 7185 obs. of 11 variables:
> > $ id : Factor w/ 160 levels "1224","1288",..: 1 1 1 1
> 1 1 1 1 1
> > 1 ...
> > $ minority: num 0 0 0 0 0 0 0 0 0 0 ...
> > $ female : num 1 1 0 0 0 0 1 0 1 0 ...
> > $ ses : num -1.528 -0.588 -0.528 -0.668 -0.158 ...
> > $ mathach : num 5.88 19.71 20.35 8.78 17.90 ...
> > $ size : num 842 842 842 842 842 842 842 842 842 842 ...
> > $ sector : num 0 0 0 0 0 0 0 0 0 0 ...
> > $ pracad : num 0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35
> 0.35 0.35 ...
> > $ disclim : num 1.60 1.60 1.60 1.60 1.60 ...
> > $ himinty : num 0 0 0 0 0 0 0 0 0 0 ...
> > $ meanses : num -0.428 -0.428 -0.428 -0.428 -0.428 -0.428 -0.428
> > -0.428 -0.428 -0.428 ...
> >
> > Also you should report what version of lme4 you are using. Do a
> > sessionInfo() and this will appear.
> >
> >> lmer(mathach ~ meanses * ses + sector * ses + (ses | id), hsb)
> > Linear mixed model fit by REML
> > Formula: mathach ~ meanses * ses + sector * ses + (ses | id)
> > Data: hsb
> > AIC BIC logLik deviance REMLdev
> > 46525 46594 -23253 46498 46505
> > Random effects:
> > Groups Name Variance Std.Dev. Corr
> > id (Intercept) 2.40744 1.55159
> > ses 0.01462 0.12091 1.000
> > Residual 36.75838 6.06287
> > Number of obs: 7185, groups: id, 160
> >
> > Fixed effects:
> > Estimate Std. Error t value
> > (Intercept) 12.0948 0.2028 59.65
> > meanses 3.3202 0.3885 8.55
> > ses 2.9052 0.1483 19.59
> > sector 1.1949 0.3079 3.88
> > meanses:ses 0.8462 0.2718 3.11
> > ses:sector -1.5781 0.2245 -7.03
> >
> > Correlation of Fixed Effects:
> > (Intr) meanss ses sector mnss:s
> > meanses 0.213
> > ses 0.076 -0.146
> > sector -0.676 -0.344 -0.064
> > meanses:ses -0.143 0.178 0.278 -0.081 ses:sector -0.062 -0.080
> > -0.679 0.093 -0.356
> >
> >> -----Original Message-----
> >> From: r-sig-mixed-models-bounces at r-project.org
> >> [mailto:r-sig-mixed-models-bounces at r-project.org] On
> Behalf Of Felix
> >> Schönbrodt
> >> Sent: Tuesday, November 18, 2008 12:02 PM
> >> To: r-sig-mixed-models at r-project.org
> >> Subject: [R-sig-ME] lme4 vs. HLM (program) - differences in
> >> estimation?
> >>
> >> Hi all,
> >>
> >> in my workgroup the HLM program by Raudenbush et al. is
> the de facto
> >> standard - however, I'd like to do my mixed models in R using the
> >> lme4 package (as I do all other computations in R as well...)
> >>
> >> Both as a practice and as an argument for my colleagues I tried to
> >> replicate (amongst others) the omnipresent
> >> "math-achievement-in- catholic-vs.-public schools"-example
> (using the
> >> original HLM-data set) in lme4.
> >>
> >> My first question is: is my formula in lmer the right one?
> >>
> >> --> HLM-style model ( Y = MATHACH; ID = grouping factor)
> >>
> >> Level-1 Model
> >>
> >> Y = B0 + B1*(SES) + R
> >>
> >> Level-2 Model
> >> B0 = G00 + G01*(SECTOR) + G02*(MEANSES) + U0
> >> B1 = G10 + G11*(SECTOR) + G12*(MEANSES) + U1
> >>
> >> Combined Model:
> >>
> >> Y = G00 + G01*(SECTOR) + G02*(MEANSES) + G11*(SECTOR)*(SES) +
> >> G12*(MEANSES)*(SES) + U0 + U1*(SES) + R
> >>
> >> --> lmer-style:
> >>
> >> HSB.ML <- lmer(mathach ~ sector + meanses + ses + meanses*ses +
> >> sector*ses + (ses.gmc | id), data=HSB)
> >>
> >>
> >>
> >> All models yield the same results concerning fixed effects; models
> >> including random slopes however show some minor divergence in the
> >> random effects variance (not very much, but it is there ...; see
> >> sample outputs below). In the presented example, the variance
> >> component is 0.10 (lme4) vs. 0.15 (HLM).
> >> I am aware that these differences are really small - in
> this case the
> >> difference was only 0.1% of the residual variance.
> >>
> >> Nonetheless I would really be happy if someone could shed
> some light
> >> on this issue, so that I can go to my colleagues and say: "We get
> >> slightly different results _because_ [...], BUT this is not a
> >> problem, _because_ ..."
> >>
> >>
> >> From my web and literature searches I have two guesses about these
> >> differences in both programs:
> >>
> >> - a statement by Douglas Bates, that lme4 uses another estimation
> >> algorithm:
> >> "[...] but may be of interest to some who are familiar with a
> >> generalized least squares (GLS) representation of
> mixed-models (such
> >> as used in MLWin and HLM). The lme4 package uses a penalized least
> >> squares (PLS) representation instead."
> >> (http://markmail.org/search/?q=lme4+PLS+GLS#query:lme4%20PLS%20GLS
> >> +page:1+mid:hb6p6mk6sztkvii2+state:results)
> >>
> >> - HLM maybe does some Bayesian Estimation, what lme4 does
> not do (?)
> >>
> >> But: these only are guesses from my side ... I'm not a
> statistician,
> >> but would like to understand it (a bit)
> >>
> >> All best,
> >> Felix
> >>
> >>
> >> #-----------------------------------
> >> # OUTPUT FROM HLM
> >> # ------------------------------------
> >>
> >> Final estimation of fixed effects:
> >>
> >> --------------------------------------------------------------
> >> --------------
> >> Standard Approx.
> >> Fixed Effect Coefficient Error T-ratio
> >> d.f.
> >> P-value
> >>
> >> --------------------------------------------------------------
> >> --------------
> >> For INTRCPT1, B0
> >> INTRCPT2, G00 12.096006 0.198734 60.865
> >> 157 0.000
> >> SECTOR, G01 1.226384 0.306271 4.004
> >> 157 0.000
> >> MEANSES, G02 5.333056 0.369161 14.446
> >> 157 0.000
> >> For SES slope, B1
> >> INTRCPT2, G10 2.937980 0.157140 18.697
> >> 157 0.000
> >> SECTOR, G11 -1.640951 0.242912 -6.755
> >> 157 0.000
> >> MEANSES, G12 1.034418 0.302574 3.419
> >> 157 0.001
> >>
> >> --------------------------------------------------------------
> >> --------------
> >>
> >>
> >> Final estimation of variance components:
> >>
> >> --------------------------------------------------------------
> >> ---------------
> >> Random Effect Standard Variance df
> >> Chi-square
> >> P-value
> >> Deviation Component
> >>
> >> --------------------------------------------------------------
> >> ---------------
> >> INTRCPT1, U0 1.54271 2.37996 157
> >> 605.29563 0.000
> >> SES slope, U1 0.38603 0.14902 157
> >> 162.30883 0.369
> >> level-1, R 6.05831 36.70309
> >>
> >> --------------------------------------------------------------
> >> ---------------
> >>
> >>
> >>
> >> #-----------------------------------
> >> # OUTPUT FROM LMER
> >> #---------------------------------------
> >>
> >> Linear mixed model fit by REML
> >> Formula: mathach ~ meanses + sector + ses.gmc + meanses * ses.gmc +
> >> sector * ses.gmc + (ses.gmc | id)
> >> Data: HSB
> >> AIC BIC logLik deviance REMLdev
> >> 46524 46592 -23252 46496 46504
> >>
> >> Random effects:
> >> Groups Name Variance Std.Dev. Corr
> >> id (Intercept) 2.379 1.543
> >> ses.gmc 0.101 0.318 0.391
> >> Residual 36.721 6.060
> >> Number of obs: 7185, groups: id, 160
> >>
> >> Fixed effects:
> >> Estimate Std. Error t value
> >> (Intercept) 12.09600 0.19873 60.866
> >> meanses 5.33291 0.36916 14.446
> >> sector 1.22645 0.30627 4.005
> >> ses.gmc 2.93877 0.15509 18.949
> >> meanses:ses.gmc 1.03890 0.29889 3.476
> >> sector:ses.gmc -1.64261 0.23978 -6.850
> >>
> >>
> >> ___________________________________
> >> Dipl. Psych. Felix Schönbrodt
> >> Department of Psychology
> >> Ludwig-Maximilian-Universität (LMU) Munich Leopoldstr. 13
> >> D-80802 München
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
>
>
More information about the R-sig-mixed-models
mailing list