[R-sig-ME] lme4 vs. HLM (program) - differences in estimation?
HDoran at air.org
Tue Nov 18 23:08:02 CET 2008
Just a thought. There is an implicit assumption in this thread that the lmer function is useful or correct because it lines up with HLM results. Now, I don't think looking to such comparisons is a bad idea, per se, especially when one is doing software development and maybe some unit testing is in order. Or, if one is very familiar with HLM and wants to learn to use lmer by comparing HLM output with the new results obtained under lmer.
However, I would offer caution that lmer isn't good because it aligns with HLM. In fact, the computational methods implemented for lmer far exceed the computational methods of almost all other programs designed for linear mixed effects (or generalized linear) models.
Lmer also lives inside a very nice programming environment (R) that allows for you to manipulate data and run the model all in one place, so there are significant ease of use issues.
I don't think you are making this assumption necessarily. But, it may be useful for you to outline for your colleagues criteria for software evaluation and line HLM and lmer up next to each other based on these criteria.
HLM is a useful program, but I would argue that the lmer function is much more capable of handling some complex issues.
From: Felix Schönbrodt [mailto:nicebread at gmx.net]
Sent: Tuesday, November 18, 2008 4:52 PM
To: Douglas Bates
Cc: Doran, Harold; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] lme4 vs. HLM (program) - differences in estimation?
Thanks very much for your thorough remarks!
I suspect you are not using the same data between HLM and R and that may be the problem. That is, in R you create a variable called ses.gmc thinking this is a group mean centered variable. But, HLM "automagically" does group mean centering for you if you ask it to.
When you work in HLM are you using the exact same data file that you created for your use in R? Or, are you relying on HLM to group mean center for you? If so, I suspect that is the issue. In fact, we can see that the difference in your estimates lies in this variable and this is the variable you manipulate in R, so I suspect the error may no be a function of estimation differences, but of data differences.
I followed your suggestion about the possibly different centering approach in HLM and did a run only with raw variables (no centering). The small differences stay.
I would check two things: does HLM estimate two variances and a
covariance for the random effects and is it using the REML criterion
or the ML criterion.
HLM is using the REML criterion; concerning the variances for random effects I couldn't dig deep enough till now to answer this question ...
From a lot of comparisons I have run now, I can conclude the following:
- fixed effects usually are equal up to 3 digits after the decimal point
- random variance components can show small deviations if they are very close to zero (and are not siginificant anyway - following the p-value reported by HLM, which is usually > 0.50 in these cases). However, I am aware of the discussion concerning the appropriateness of p-values reported by HLM (see also http://markmail.org/search/?q=lme4+p-value#query:lme4%20p-value+page:1+mid:3t4hxxrlh3uvb7kh+state:results).
Maybe I was too nitpicking about these small differences - they only seem to appear in coefficients/ variance components that are insignificant anyway. But the discussion was definitely productive for me, and I think I now can convince my colleagues to use lme4 instead ;-)
Maybe someoner wants to dig deeper into this issue; the data set from the HLM program (HS&B data) can be downloaded here for example: http://www.hlm-online.com/datasets/HSBDataset/HSBdata.zip.
More information about the R-sig-mixed-models