[R-sig-ME] I'm sorry, and here is what I mean to ask about speed

Paul Johnson pauljohn32 at gmail.com
Thu Sep 24 03:42:42 CEST 2009


I'm sorry I made Doug mad and I'm sorry to have led the discussion off
into such a strange, disagreeable place.

Now that I understand your answers, I believe I can ask the question
in a non-naive way.  I believe this version should not provoke some of
the harsh words that I triggered in my awkward question.

New Non-Naive version of the Speed Question

Do you have a copy of HLM6 on your system?  Maybe you could help me by
running the same model in R (with any of the packages such as lme4,
nlme, or whatever) and HLM6 and let us all know if you get similar
estimates and how long it takes to run each one.

Here's why I ask.

My colleague has HLM6 on Windows XP and he compared a two-level linear
mixed effects model fitted with lmer from lme4 against HLM6.  He
surprised my by claiming that the HLM6 model estimation was completed
in about 1.5 seconds and the lmer estimation took 50 seconds.  That
did not seem right to me.  I looked a bit at his example and made a
few mental notes so I could ask you what to look for when I go back to
dig into this.  There are 27000 cases in his datasets and he has about
25 variables at the lower level of observation and 4 or 5 variables at
the higher level, which I think is the county of survey respondents.
He is fitting a random intercept (random across counties) and several
random slopes for the higher level variables.

He pointed out that the mlWin website reported speed differences in
2006 that were about the same.  Of course, you and I know that R and
all of the mixed effects packages have improved significantly since
then. That is why the speed gap on the one Windows XP system surprised
me.

Can you tell me if you see a difference between the two programs (if
you have HLM6).  If you see a difference on the same magnitude, it may
mean we are not mistaken in our conclusion.  But if you see no
difference, then it will mean I'm getting it wrong and I should
investigate more. If I can't solve it, I should provide a reproducible
example for your inspection.  I will ask permission to release the
private data to  you in that case.

Perhaps you think there are good reasons why R estimation takes longer.  E.g.:
1. HLM programmers have full access to benefit from optimizations in
lmer and other open programs, but they don't share their optimizations
in return.
2. lmer and other R routines are making calculations in a better way,
a more accurate way, so we should not worry that they take longer.
   That was my first guess, in the original mail I said I thought that
HLM was using PQL whereas lmer is using Laplace or Adaptive Gaussian
Quadrature.  But Doug's comment indicated that I was mistaken to
expect a difference there because REML is the default in lmer and it
is also what HLM is doing, and there's no involvement of quadrature or
integral approximation in a mixed linear model (gaussian dependent
variable).

On the other hand, perhaps you are (like me) surprised by this
difference and you want to help me figure out the cause of the
differences.  If you have ideas about that, maybe we can work together
(I don't suck at C!). I have pretty much experience profiling programs
in C and did some optimization help on a big-ish C++ based R package
this summer.

So far, I have a simple observer's interest in this question.   I
advise people whether it is beneficial for them to spend their scarce
resources on a commercial package like HLM6 and one of the factors
that is important to them is how "fast" the programs are.   I
personally don't see an urgent reason to buy HLM because it can
estimate a model in 1 second and an open source approach requires 50
seconds.  But I'm not the one making the decision. If I can make the R
version run almost as fast as HLM6, or provide reasons why people
might benefit from using a program that takes longer, then I can do my
job of advising the users.

I am sorry if this question appears impertinent or insulting. I do not
mean it as a criticism.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas




More information about the R-sig-mixed-models mailing list