[R-sig-ME] More naive questions: Speed comparisons? what is a "stack imbalance" in lmer? does lmer center variables?

Thu Sep 24 02:11:47 CEST 2009

   I think this reasonable and interesting, but it reads more like an
apology (in the technical sense, not indicating any sense of shame) for
R rather than a measured pros and cons.  I could quibble with almost any
of these statements ... probably should just stop there, but I'll
continue below.

Adam D. I. Kramer wrote:
> Dear collagues,
> 
>         Questions of speed, especially comparative speed between R and some
> proprietary program, come up with some frequency.  Perhaps we should add a
> FAQ that covers this?  Here is my first attempt at such a contribution.
> 
> Section: R Speed. I have {heard,seen,shown} that R is {a bit,a lot,eons}
> faster than [other program].  Is this true?
> 
> 0) We assume that the "benchmark" data sets adequately represent the sorts
> of tasks you might use the software for, and that it includes the whole
> process (including data import/transformation to prepare the model to run).

  Good point.
> 
> 1) Proprietary software produces unverifiable statistical results: You
> cannot deduce or detect when, whether, or how your results are wrong, unless
> you compare the results to a trustworthy algorithm.  It is always faster to
> use one method (e.g., the trustworthy one) than two.

   I don't think that simply having the access to the code guarantees
"verification". It sure helps though! My personal criteria for trust in
a piece of software would also include [in no particular order] (1)
trust in the credentials, skill, and attention to detail of the author;
(2) comparisons against benchmark cases where analytical results or
carefully hand-worked results are available; (3) comparisons against
other software.

> 
> 2) Paying skilled programmers to write, optimize, and compile statistical
> algorthims optimizes for speed.  Letting skilled statisticians who know how
> to code and are personally motivated to implement an algorithm optimizes for
> correctness and robustness of a model and its results.

  Yes, but ... where open source tends to fall down is in implementing
features (and I mean real features, not questionable "features") that
are outside the focus of the developers.  There are plenty of "wish
list" items that people would happily pay for that don't get done
because no-one with the skills feels like doing it. I've thought before
of trying to set up a bounty system for R, but even that is too far down
on my list to get to ...
> 
> 3) The extensible nature of R algorithms may require more processing time,
> which is likely less than the time required to export, import, reformat, and
> re-tidy a data set for several programs (HLM6, SPSS, SAS, etc.), much like
> it is faster to visit a large department store to buy a frying pan, perfume,
> and a suit than it is to drive across three times to specific stores.

  Analogy alert!
  Agreed that it is nice to use a single tool for all jobs, but that's
not necessarily the best idea.  I agree that all software should make it
easy to EXPORT data to other formats (or to interchange formats), and
open source software is often better at this.  (Should we really try to
re-implement GIS systems, relational databases, etc. within R? no.)

> 
> 4) The speed of R changes based on several factors, including the version of
> R you use and in some cases the BLAS against which you link.  If you believe
> R is much slower than it should be, be sure you have upgraded to the latest
> version and are using the appropriate version for your hardware and software
> (e.g., 64-bit R for Snow Leopard).

  yes.
> 
> 5) R's code is written to treat arbitrarily-large data sets; many "very
> fast" algorithms that other programs could use may well make assumptions
> about data size that are intractible for large data sets, leading to
> crashes, errors, or results that seem correct but are not.

   Certainly not always true.  (See Kevin Wright's post.)

> 
> Further:
> 
> 6) R is more than a one-trick pony, and the ability to do nearly everything
> comes at the expense of microoptimization.  Human triatheletes also do not
> and cannot bike, run, or swim as quickly as single-focus bicyclists,
> runners, or swimmers.

  Yeah, so?
  The "Unix philosophy" of tools suggests instead that one should
instead write sets of tools, each of which does one thing well, and have
them talk to each other.  Just saying that there are arguments for
different levels of modularity, specialization, etc..

> 
> 7) It is probably unreasonable to even expect open-source interpreted code
> to be as fast as code written by people whose full-time job is to write and
> optimize code. The fact that R's speed is of the same order of magnitude as
> "some other program" is itself remarkable.

  So?
> 
> ...in sum, it is possible that a proprietary software system may produce a
> result of the form you expect faster than R does, but it is unlikely to be
> MUCH faster when you consider the entire data analysis process, and there
> are several strong arguments for waiting the few extra seconds.

  "It depends."
> 
> --Adam
> 
> On Wed, 23 Sep 2009, Douglas Bates wrote:
> 
>> I forgot to cc: the list on this reply.
>>
>> ---------- Forwarded message ----------
>> From: Douglas Bates <bates at stat.wisc.edu>
>> Date: Wed, Sep 23, 2009 at 11:29 AM
>> Subject: Re: [R-sig-ME] More naive questions: Speed comparisons? what
>> is a "stack imbalance" in lmer? does lmer center variables?
>> To: Paul Johnson <pauljohn32 at gmail.com>
>>
>>
>>
>> On Wed, Sep 23, 2009 at 1:36 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>>> Sent this to r-sig-debian by mistake the first time.  Depressing.
>>> 1.  One general question for general discussion:
>>> Is HLM6 faster than lmer? If so, why?
>>> I'm always advocating R to students, but some faculty members are
>>> skeptical.  A colleague compared the commercial HLM6 software to lmer.
>>>  HLM6 seems to fit the model in 1 second, but lmer takes 60 seconds.
>>> If you have HLM6 (I don't), can you tell me if you see similar differences?
>> Honestly, Paul, I don't see a point in discussing vague generalities
>> and rumors.  If your colleague can make the data (or even simuiated
>> data with a similar structure) available and describe the model being
>> fit then we can see why lmer appears to be so much slower on it.  But
>> do bear in mind that the folks marketing HLM6 have the ability -
>> indeed they have the right - to run lmer on their test cases and to
>> examine every single line of code in lmer so they can determine
>> exactly what it is doing.  They have the right to do that because
>> everyone has the right to do that.  If we want to find out how HLM6
>> performs on a test case we are reduced to your approach of asking if
>> someone has a copy of the software (on one of or perhaps the only
>> operating system under which it runs) and all that can be done is to
>> quote the results.  The internal details of HLM6 are proprietary.
>>
>> I can tell you my goals in developing the lme4 package are:
>>  1. Provide the ability to fit general forms of mixed-effects models.
>>  2. To the best of my ability, provide reliable estimates.
>>  3. Allow for models to be fit to very large data sets.
>>  4. Take advantage of the R environment to allow for straightforward
>> model specification and for very general data manipulation and
>> graphical capabilities.
>>  5. Subject to objectives 1-4 provide the fastest software that I can
>>
>>> My first thought was that LM6 uses PQL by default, and it would be
>>> faster.  However, in the output, HLM6 says:
>>> Method of estimation: restricted maximum likelihood
>>> But that doesn't tell me what quadrature approach they use, does it?
>> To me this is very confusing.  In my world REML is only meaningful for
>> linear mixed models.  (I know, at one time Mary Lindstrom and I wrote
>> about REML estimates for nonlinear mixed models but, as I have said
>> before, I consider that a youthful indiscretion.)  For linear mixed
>> models there is no need to use PQL or to discuss quadrature because
>> the evaluation of the profiled deviance or the profiled REML criterion
>> is a direct calculation.
>>
>> Again, without a more complete description of the data and the model
>> to be fit it is impossible to discuss these issues meaningfully.
>>
>>> Another explanation for the difference in time might be the way HLM6
>>> saves the results of some matrix calculations and re-uses them behind
>>> the scenes.  If every call to lmer is re-calculating some big matrix
>>> results, I suppose that could explain it.
>> But you don't need to speculate about what lmer does.  It is Open
>> Source so you can check for yourself.
>>
>> However, this does bring up another point which is the need to compare
>> apples with apples when you are benchmarking software.  If the data
>> import and model specification stages in HLM6 create the necessary
>> matrix structures for performing the iterative fit then does the time
>> to fit the model consist solely of the optimization and summary
>> stages?  Using an expression like
>>
>> system.time(fm1 <- lmer(...))
>>
>> is assessing the time to take the original data, which could be in a
>> very general form, create all those internal structures and perform
>> the optimization.
>>
>> You should bear in mind that a lot of that construction of the model
>> structures is written in R exactly so that it is capable of fitting
>> very general model specifications.  The code in HLM6 is, I imagine,
>> compiled code, which is possible because it targets a very specific
>> task, and compiled code is always going to be much faster than
>> interpreted code.
>>
>>> There are comparisons from 2006 here
>>> http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/tables.shtml
>>> that indicate that lme was much slower than HLM, but that doesn't help
>>> me understand *why* there is a difference.
>> I believe that those examples are available for testing with lmer in
>> the mlmRev package for R.  However, the results can't be compared
>> meaningfully to the tabled results because today's hardware is so
>> different from what was used for those comparisons.
>>
>> Besides, none of those examples is really challenging.  I was just
>> doing some timing tests on a model fit to 1.7 million observations
>> with 750,000 random effects associated with four different, non-nested
>> grouping factors and 40 fixed-effects parameters.  To me that is a
>> meaningful test because it took from 1 to 2 hours to fit on a fast
>> server computer.  Unfortunately, I can't make that data available
>> because of confidentiality restrictions but, even if I could, I don't
>> think the model could be fit in HLM6 or MLWin or SAS PROC MIXED or
>> Stata because there are 750,000 random effects in a non-nested
>> configuration.
>>
>> I can make a similar but smaller test case available using the star
>> (Student-Teacher Achievement Ratio) data from the mlmRev package.  One
>> model could be
>>
>>> system.time(fm1 <- lmer(math ~ gr + cltype + sx + eth + (1|id) + (1|tch) + (1|sch), star))
>>   user  system elapsed
>>  9.885   0.044  10.508
>>> print(fm1, corr = FALSE)
>> Linear mixed model fit by REML
>> Formula: math ~ gr + cltype + sx + eth + (1 | id) + (1 | tch) + (1 | sch)
>>   Data: star
>>    AIC    BIC  logLik deviance REMLdev
>>  239236 239365 -119602   239245  239204
>> Random effects:
>>  Groups   Name        Variance Std.Dev.
>>  id       (Intercept) 1001.60  31.648
>>  tch      (Intercept)  295.03  17.176
>>  sch      (Intercept)  104.78  10.236
>>  Residual              397.32  19.933
>> Number of obs: 24578, groups: id, 10732; tch, 1374; sch, 80
>>
>> Fixed effects:
>>            Estimate Std. Error t value
>> (Intercept) 560.8868     1.5745   356.2
>> gr.L         95.6126     1.0080    94.9
>> gr.Q         -4.6783     0.9857    -4.7
>> gr.C         -3.2320     0.9729    -3.3
>> cltypereg    -7.7943     1.3322    -5.9
>> cltypereg+A  -7.0055     1.3265    -5.3
>> sxF           2.8766     0.6836     4.2
>> ethB        -21.9197     1.2219   -17.9
>> ethA          3.1940     6.7037     0.5
>> ethH          4.5411     9.6653     0.5
>> ethI        -28.1618    13.5359    -2.1
>> ethO          3.4356     8.3133     0.4
>>
>> Unfortunately, because that model is based on only 25000 observations
>> and 11000 random effects it is difficult to get a meaningful timing.
>> Timings under 10 seconds, like this, are subject to too much
>> variability.
>>
>> Nevertheless you can take that data and try to fit that model in any
>> of the other software systems.  Note that student (the "id" factor) is
>> not nested within teacher ("tch") or school, as will almost always
>> happen when fitting longitudinal data so the random effects are
>> partially crossed.  In other words, they are not nested so the
>> structure is not hierarchical or multi-level in the sense of HLM or
>> MLWin.  I would invite those who have access to such software to fit
>> the model using lme4 and to fit an equivalent model in other systems
>> and tell us how they compare.  I can't do that because I choose not to
>> use proprietary software.
>>
>>> 2. What does "stack imbalance in .Call" mean in lmer?
>> It is a sign of a bug in the C code underlying the lme4 package.  It
>> should be reported as a bug if it occurred in the currently released
>> version of lme4.
>>
>>> Here's why I ask.  Searching for comparisons of lmer and HLM,  I went
>>> to CRAN &  I checked this document:
>>>
>>> http://cran.r-project.org/web/packages/mlmRev/vignettes/MlmSoftRev.pdf
>>>
>>> I *think* these things are automatically generated.  The version
>>> that's up there at this moment  (mlmRev edition 0.99875-1)  has pages
>>> full of the error message:
>>>
>>> stack imbalance in .Call,
>>>
>>> Were those always there?  I don't think so.   What do they mean?
>> No they shouldn't be there.  I'll update the mlmRev package.
>>
>>> 3. In the HLM6 output, there is a message at the end of the variable list:
>>>
>>> '%' - This level-1 predictor has been centered around its grand mean.
>>> '$' - This level-2 predictor has been centered around its grand mean.
>>>
>>> What effect does that have on the estimates?  I believe it should have
>>> no effect on the fixed effect slope estimates, but it seems to me the
>>> estimates of the variances of random parameters would be
>>> changed.  In order to make the estimates from lmer as directly
>>> comparable as possible, should I manually center all of the variables
>>> before fitting the model?   I'm a little stumped on how to center a
>>> multi-category factor before feeding it to lmer.  Know what I mean?
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>

-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc