[R-sig-ME] I'm sorry, and here is what I mean to ask about speed

Thu Sep 24 05:44:36 CEST 2009

Thanks for rephrasing your question, Paul.

On Wed, Sep 23, 2009 at 8:42 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
> I'm sorry I made Doug mad and I'm sorry to have led the discussion off
> into such a strange, disagreeable place.
>
> Now that I understand your answers, I believe I can ask the question
> in a non-naive way.  I believe this version should not provoke some of
> the harsh words that I triggered in my awkward question.
>
> New Non-Naive version of the Speed Question
>
> Do you have a copy of HLM6 on your system?  Maybe you could help me by
> running the same model in R (with any of the packages such as lme4,
> nlme, or whatever) and HLM6 and let us all know if you get similar
> estimates and how long it takes to run each one.

I still claim it would help to have a reproducible example with known
data and a known model to fit.
> Here's why I ask.
>
> My colleague has HLM6 on Windows XP and he compared a two-level linear
> mixed effects model fitted with lmer from lme4 against HLM6.  He
> surprised my by claiming that the HLM6 model estimation was completed
> in about 1.5 seconds and the lmer estimation took 50 seconds.  That
> did not seem right to me.  I looked a bit at his example and made a
> few mental notes so I could ask you what to look for when I go back to
> dig into this.  There are 27000 cases in his datasets and he has about
> 25 variables at the lower level of observation and 4 or 5 variables at
> the higher level, which I think is the county of survey respondents.
> He is fitting a random intercept (random across counties) and several
> random slopes for the higher level variables.
>
> He pointed out that the mlWin website reported speed differences in
> 2006 that were about the same.  Of course, you and I know that R and
> all of the mixed effects packages have improved significantly since
> then. That is why the speed gap on the one Windows XP system surprised
> me.
>
> Can you tell me if you see a difference between the two programs (if
> you have HLM6).  If you see a difference on the same magnitude, it may
> mean we are not mistaken in our conclusion.  But if you see no
> difference, then it will mean I'm getting it wrong and I should
> investigate more. If I can't solve it, I should provide a reproducible
> example for your inspection.  I will ask permission to release the
> private data to  you in that case.
>
> Perhaps you think there are good reasons why R estimation takes longer.  E.g.:
> 1. HLM programmers have full access to benefit from optimizations in
> lmer and other open programs, but they don't share their optimizations
> in return.
> 2. lmer and other R routines are making calculations in a better way,
> a more accurate way, so we should not worry that they take longer.
>   That was my first guess, in the original mail I said I thought that
> HLM was using PQL whereas lmer is using Laplace or Adaptive Gaussian
> Quadrature.  But Doug's comment indicated that I was mistaken to
> expect a difference there because REML is the default in lmer and it
> is also what HLM is doing, and there's no involvement of quadrature or
> integral approximation in a mixed linear model (gaussian dependent
> variable).
>
> On the other hand, perhaps you are (like me) surprised by this
> difference and you want to help me figure out the cause of the
> differences.  If you have ideas about that, maybe we can work together
> (I don't suck at C!). I have pretty much experience profiling programs
> in C and did some optimization help on a big-ish C++ based R package
> this summer.
>
> So far, I have a simple observer's interest in this question.   I
> advise people whether it is beneficial for them to spend their scarce
> resources on a commercial package like HLM6 and one of the factors
> that is important to them is how "fast" the programs are.   I
> personally don't see an urgent reason to buy HLM because it can
> estimate a model in 1 second and an open source approach requires 50
> seconds.  But I'm not the one making the decision. If I can make the R
> version run almost as fast as HLM6, or provide reasons why people
> might benefit from using a program that takes longer, then I can do my
> job of advising the users.
>
> I am sorry if this question appears impertinent or insulting. I do not
> mean it as a criticism.
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>