[R-sig-ME] Computational speed - MCMCglmm/lmer

Wed Jun 23 17:36:38 CEST 2010

On Tue, Jun 22, 2010 at 3:06 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
> On Sat, Jun 19, 2010 at 10:42 AM, David Atkins <datkins at u.washington.edu> wrote:
>>
>> Hi all--
>>
>> I use (g)lmer and MCMCglmm on a weekly basis, and I am wondering about
>> options for speeding up their computations.  This is primarily an issue with
>> MCMCglmm, given the many necessary MCMC iterations to get to convergence on
>> some problems.  But, even with glmer(), I have runs that get into 20-30
>> minutes.
>>
>> 3. "Optimized" BLAS: There's a bit of discussion about optimized BLAS (basis
>> linear algebra... something).  However, these discussions note that there is
>> no generally superior BLAS.  Not sure whether specific BLAS might be
>> optimized for GLMM computations.
>>
>> 4. Parallel computing: With multi-core computers, looks like there are some
>> avenues for splitting intensive computations across processors.
>
> Hi, Dave:
>
> I've wondered this same thing. I replaced the base R BLAS with
> GOTOBLAS2 and ATLAS and both are much faster than R's base BLAS.  In
> Gotoblas2, computation is about 10 x faster on linear algebra
> problems, especially on the kinds of problems where it can  thread
> computations across all cores.  The BLAS library from Atlas does not
> seem to thread, so it is not quite so fast.
>
> In either case, I've tested your example on this Lenovo T61 laptop
> with dual core Pentium that maxes out at 2.4GHz,
>
> To calculate your model with the base R BLAS:
>
> drk.glmer
>
>  user  system elapsed
>  29.920   0.120  30.245
>
>
> The time elapsed with the optimized BLAS is not so much faster as I
> had expected. With Atlas it is:
>
>   user  system elapsed
>  25.660   0.100  25.784
>
> Gotoblas2 is almost identical, I'm quite surprised.  On other tests
> I've done, it supplies a more noticeable speedup because it can go
> multi core when needed.  I was monitoring the CPU and the calculations
> all stay on one core.
>
>   user  system elapsed
>  25.670   0.050  25.725
>
> Well, if you use Atlas or GOTOBLAS2, you can expect a speedup of about 1/6th.

On this particular model/data set combination.  Accelarated BLAS
change the speed of low-level numerical linear algebra operations, the
so-called basic linear algebra subroutines.  If those are the
bottleneck in your calculation you will see a performance boost.  If
it is not, you won't.

Accelerated BLAS are not a panacea.  Neither is parallel computation.
When your computation if essentially single-threaded, as an
optimization like this is, it doesn't matter if you have one core or
twelve.

The basic rule of optimizing performance of programs is to profile
*before* you make changes.  Making great efforts to optimize an
operation that takes only 5% of the execution time will provide you
with at most a 5% gain in performance.

> I made the mistake of running that example with MCMCglmm in your code.
>  The system is locked in mortal combat with that.  I didn't notice
> your time was 1208. before I started that one.  :(

Forgive me for sounding grouchy but I find this whole discussion
misguided.  Worrying about the speed of fitting a model and niceties
of the model formulation before doing elementary checks on the data is
putting the cart before the horse.  Why is gender coded as 0, 1 and 2?
 Why, when there was a maximum of 90 days of monitoring, is there one
id with 435 observations and another with 180 observations.  Did
someone really have 45 drinks in one day and, if so, are they still
alive?  Accelerated BLAS and parallel algorithms are way, way down the
list of issues that should be addressed.