[R-sig-ME] I'm sorry, and here is what I mean to ask about speed

Thu Sep 24 04:34:13 CEST 2009

On Thu, September 24, 2009 1:42 am, Paul Johnson wrote:
> Do you have a copy of HLM6 on your system?  Maybe you could help me by
> running the same model in R (with any of the packages such as lme4,
> nlme, or whatever) and HLM6 and let us all know if you get similar
> estimates and how long it takes to run each one.
>

There is a time-limited trial version of HLM
http://www.ssicentral.com/hlm/downloads.html

I haven't tried it, but I expect that I wont like it. I expect it will
have a philosophy that every model is special, rather than providing a
general model specification as in R.

>
> My colleague has HLM6 on Windows XP and he compared a two-level linear
> mixed effects model fitted with lmer from lme4 against HLM6.  He
> surprised my by claiming that the HLM6 model estimation was completed
> in about 1.5 seconds and the lmer estimation took 50 seconds.  That
> did not seem right to me.  I looked a bit at his example and made a
> few mental notes so I could ask you what to look for when I go back to
> dig into this.  There are 27000 cases in his datasets and he has about
> 25 variables at the lower level of observation and 4 or 5 variables at
> the higher level, which I think is the county of survey respondents.
> He is fitting a random intercept (random across counties) and several
> random slopes for the higher level variables.
>
> He pointed out that the mlWin website reported speed differences in
> 2006 that were about the same.  Of course, you and I know that R and
> all of the mixed effects packages have improved significantly since
> then. That is why the speed gap on the one Windows XP system surprised
> me.
>
> Can you tell me if you see a difference between the two programs (if
> you have HLM6).  If you see a difference on the same magnitude, it may
> mean we are not mistaken in our conclusion.  But if you see no
> difference, then it will mean I'm getting it wrong and I should
> investigate more. If I can't solve it, I should provide a reproducible
> example for your inspection.  I will ask permission to release the
> private data to  you in that case.
>
> Perhaps you think there are good reasons why R estimation takes longer.
> E.g.:
> 1. HLM programmers have full access to benefit from optimizations in
> lmer and other open programs, but they don't share their optimizations
> in return.
> 2. lmer and other R routines are making calculations in a better way,
> a more accurate way, so we should not worry that they take longer.

I don't know what HLM use. lme used an EM algorithm which is one of the
slower ways, but with excellent properties, and I assume lmer uses the
same. It may be that Doug has set some of the convergence criteria so that
it will work with very complex models and these could be relaxed at the
users peril.  I would prefer slower.  R can also take a lot of time to do
some things, and the only way around this is to rewrite everything in C or
Fortran.

To me R has the advantage that I can set up a large number of complex
analyses with graphs, and just run the lot. If time became a concern
because it was preventing other use of my computer then I would set up a
linux server and run everything remotely.

>    That was my first guess, in the original mail I said I thought that
> HLM was using PQL whereas lmer is using Laplace or Adaptive Gaussian
> Quadrature.  But Doug's comment indicated that I was mistaken to
> expect a difference there because REML is the default in lmer and it
> is also what HLM is doing, and there's no involvement of quadrature or
> integral approximation in a mixed linear model (gaussian dependent
> variable).
>
> On the other hand, perhaps you are (like me) surprised by this
> difference and you want to help me figure out the cause of the
> differences.  If you have ideas about that, maybe we can work together
> (I don't suck at C!). I have pretty much experience profiling programs
> in C and did some optimization help on a big-ish C++ based R package
> this summer.
>
> So far, I have a simple observer's interest in this question.   I
> advise people whether it is beneficial for them to spend their scarce
> resources on a commercial package like HLM6 and one of the factors
> that is important to them is how "fast" the programs are.   I
> personally don't see an urgent reason to buy HLM because it can
> estimate a model in 1 second and an open source approach requires 50
> seconds.  But I'm not the one making the decision. If I can make the R
> version run almost as fast as HLM6, or provide reasons why people
> might benefit from using a program that takes longer, then I can do my
> job of advising the users.
>
> I am sorry if this question appears impertinent or insulting. I do not
> mean it as a criticism.
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>