[R-sig-ME] lme4 vs. HLM (program) - differences in estimation?

Tue Nov 18 18:02:17 CET 2008

Hi all,

in my workgroup the HLM program by Raudenbush et al. is the de facto  
standard - however, I'd like to do my mixed models in R using the lme4  
package (as I do all other computations in R as well...)

Both as a practice and as an argument for my colleagues I tried to  
replicate (amongst others) the omnipresent "math-achievement-in- 
catholic-vs.-public schools"-example (using the original HLM-data set)  
in lme4.

My first question is: is my formula in lmer the right one?

--> HLM-style model ( Y = MATHACH; ID = grouping factor)

Level-1 Model

	Y = B0 + B1*(SES) + R

Level-2 Model
	B0 = G00 + G01*(SECTOR) + G02*(MEANSES) + U0
	B1 = G10 + G11*(SECTOR) + G12*(MEANSES) + U1

Combined Model:

	Y = G00 + G01*(SECTOR) + G02*(MEANSES) + G11*(SECTOR)*(SES) +  
G12*(MEANSES)*(SES) + U0 + U1*(SES) + R

--> lmer-style:

	HSB.ML <- lmer(mathach ~ sector + meanses + ses + meanses*ses +  
sector*ses + (ses.gmc | id), data=HSB)

All models yield the same results concerning fixed effects; models  
including random slopes however show some minor divergence in the  
random effects variance (not very much, but it is there ...; see  
sample outputs below). In the presented example, the variance  
component is 0.10 (lme4) vs. 0.15 (HLM).
I am aware that these differences are really small - in this case the  
difference was only 0.1% of the residual variance.

Nonetheless I would really be happy if someone could shed some light  
on this issue, so that I can go to my colleagues and say: "We get  
slightly different results _because_ [...], BUT this is not a problem,  
_because_ ..."

 From my web and literature searches I have two guesses about these  
differences in both programs:

- a statement by Douglas Bates, that lme4 uses another estimation  
algorithm:
"[...] but may be of interest to some who are familiar with a  
generalized least squares (GLS) representation of mixed-models (such  
as used in MLWin and HLM). The lme4 package uses a penalized least  
squares (PLS) representation instead."
(http://markmail.org/search/?q=lme4+PLS+GLS#query:lme4%20PLS%20GLS 
+page:1+mid:hb6p6mk6sztkvii2+state:results)

- HLM maybe does some Bayesian Estimation, what lme4 does not do (?)

But: these only are guesses from my side ... I'm not a statistician,  
but would like to understand it (a bit)

All best,
Felix

#-----------------------------------
# OUTPUT FROM HLM
# ------------------------------------

  Final estimation of fixed effects:

----------------------------------------------------------------------------
                                        Standard             Approx.
     Fixed Effect         Coefficient   Error      T-ratio   d.f.      
P-value

----------------------------------------------------------------------------
  For       INTRCPT1, B0
     INTRCPT2, G00          12.096006   0.198734    60.865        
157    0.000
       SECTOR, G01           1.226384   0.306271     4.004        
157    0.000
      MEANSES, G02           5.333056   0.369161    14.446        
157    0.000
  For      SES slope, B1
     INTRCPT2, G10           2.937980   0.157140    18.697        
157    0.000
       SECTOR, G11          -1.640951   0.242912    -6.755        
157    0.000
      MEANSES, G12           1.034418   0.302574     3.419        
157    0.001

----------------------------------------------------------------------------

  Final estimation of variance components:

-----------------------------------------------------------------------------
  Random Effect           Standard      Variance     df    Chi-square   
P-value
                          Deviation     Component

-----------------------------------------------------------------------------
  INTRCPT1,       U0        1.54271       2.37996   157      
605.29563    0.000
       SES slope, U1        0.38603       0.14902   157      
162.30883    0.369
   level-1,       R         6.05831      36.70309

-----------------------------------------------------------------------------

#-----------------------------------
# OUTPUT FROM LMER
#---------------------------------------

Linear mixed model fit by REML
Formula: mathach ~ meanses + sector + ses.gmc + meanses * ses.gmc +  
sector *      ses.gmc + (ses.gmc | id)
    Data: HSB
    AIC   BIC logLik deviance REMLdev
  46524 46592 -23252    46496   46504

Random effects:
  Groups   Name        Variance Std.Dev. Corr
  id       (Intercept)  2.379   1.543
           ses.gmc      0.101   0.318    0.391
  Residual             36.721   6.060
Number of obs: 7185, groups: id, 160

Fixed effects:
                 Estimate Std. Error t value
(Intercept)     12.09600    0.19873  60.866
meanses          5.33291    0.36916  14.446
sector           1.22645    0.30627   4.005
ses.gmc          2.93877    0.15509  18.949
meanses:ses.gmc  1.03890    0.29889   3.476
sector:ses.gmc  -1.64261    0.23978  -6.850

___________________________________
Dipl. Psych. Felix Schönbrodt
Department of Psychology
Ludwig-Maximilian-Universität (LMU) Munich
Leopoldstr. 13
D-80802 München