[R-sig-ME] multicore lmer: any changes since 2007?

Wed Jul 6 19:51:51 CEST 2011

On Wed, Jul 6, 2011 at 12:40 PM, Ben Bolker <bbolker at gmail.com> wrote:
> On 07/06/2011 11:40 AM, Mike Lawrence wrote:
>> Thanks for the detailed reply.
>>
>> According to the comment by Joshua Ulrich (one of the DEoptim
>> developers) on this stackoverflow post
>> (http://stackoverflow.com/questions/3759878/parallel-optimization-in-r),
>> it seems that DEoptim might be a parallel-izable optimizer, and as I
>> recall you can put box constraints on parameters with DEoptim. I just
>> sent off an email to the DEoptim developers to see if there's been any
>> progress on the parallel front.
>>
>> Mike
>
>  I have used differential evolution in the past (although not in
> decades (!!)), although not the DEoptim() package, but I don't think
> DEoptim() will be appropriate for this purpose.  I'm actually not
> entirely clear on what DB means by "parallel evaluation of the objective
> function".  In the simplest derivative-free case for example (the
> Nelder-Mead simplex), it's hard to see how one could evaluate the
> objective function in parallel because each evaluation changes the
> structure of the simplex and determines where the next evaluation should
> be.  A very quick look at BOBYQA (source code in the minqa package, or
> formal description at
> <http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf>) suggests
> the same one-point-at-a-time updating scheme.
>
>  But DB says
>
>>> In many cases you know several points where you will be
>>> evaluating the objective so you could split those
>>> off into different threads.
>
>  Since he has (probably literally) forgotten more about numerical
> computation than I ever knew, he's probably right, but I don't know of
> those examples.
>
>  Interesting discussion ...

You have done a better job of checking on the veracity of my
statements than I did.  I was thinking of cases where the gradient is
approximated by finite differences.  In many cases the number of
parameters in the optimization in lmer is small and you could do the
finite difference calculations in parallel.  However, you have looked
at the code, which is more than I did, and I will accept that neither
of the methods you mention permit you to spawn off multiple function
evaluations in parallel.

Regarding the differential evolution optimization, i agree with you
that it may not be best for this job.  Some perfunctory testing with
Dirk Eddelbuettel's implementation in the RcppDE package indicated
that it would end up taking longer than a classical optimizer based on
a local quadratic approximation.  I think DE is intended for more
difficult objective functions than those presented by lmer.
>> On Wed, Jul 6, 2011 at 11:59 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
>>> On Tue, Jul 5, 2011 at 4:52 PM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
>>>> Back in 2007 (http://tolstoy.newcastle.edu.au/R/e2/help/07/05/17777.html)
>>>> Dr. Bates suggested that using a multithreaded BLAS was the only
>>>> option for speeding lmer computations on multicore machines (and even
>>>> then, it might even cause a slow down under some circumstances).
>>>>
>>>> Is this advice still current, or have other means of speeding lmer
>>>> computations on multicore machines arisen in more recent years?
>>>
>>> As always, the problem with trying to parallelize a particular
>>> calculation is to determine how and when to start more than one
>>> thread.
>>>
>>> After the setup stage the calculations in fitting an lmer model
>>> involve optimizing the profiled deviance or profiled REML criterion.
>>> Each evaluation of the criterion involves updating the components of
>>> the relative covariance factor, updating the sparse Cholesky
>>> decomposition and solving a couple of systems of equations involving
>>> the sparse Cholesky factor.
>>>
>>> There are a couple of calculations involving dense matrices but in
>>> most cases the time spent on them is negligible relative to the
>>> calculations involving the sparse matrices.
>>>
>>> A multithreaded BLAS will only help speed up the calculations on dense
>>> matrices.  The "supernodal" form of the Cholesky factorization can use
>>> the BLAS for some calculations but usually on small blocks.  Most of
>>> the time the software chooses the "simplicial" form of the
>>> factorization because the supernodal form would not be efficient and
>>> the simplicial form doesn't use the BLAS at all.  Even if the
>>> supernodal form is chosen, the block sizes are usually small and a
>>> multithreaded BLAS can actually slow down operations on small blocks
>>> because the communication and synchronization overhead cancels out any
>>> gain from using multiple cores.
>>>
>>> Of course, your mileage may vary and only by profiling both the R code
>>> and the compiled code will you be able to determine how things could
>>> be sped up.
>>>
>>> If I had to guess, I would say that the best hope for parallelizing
>>> the computation would be to find an optimizer that allows for parallel
>>> evaluation of the objective function.  The lme4 package requires
>>> optimization of a nonlinear objective subject to "box constraints"
>>> (meaning that some of the parameters can have upper and/or lower
>>> bounds).  Actually it is simpler than that, some of the parameters
>>> must be positive.  We do not provide gradient evaluations.  I once*
>>> worked out the gradient of the criterion (I think it was the best
>>> mathematics I ever did) and then found that it ended up slowing the
>>> optimization to a crawl in the difficult cases.  A bit of reflection
>>> showed that each evaluation of the gradient could be hundreds or
>>> thousands of times more complex than an evaluation of the objective
>>> itself so you might as well use a gradient free method and just do
>>> more function evaluations.  In many cases you know several points
>>> where you will be evaluating the objective so you could split those
>>> off into different threads.
>>>
>>> I don't know of such a mulithreaded optimizer (many of the optimizers
>>> that I find are still being written in Fortran 77, God help us) but
>>> that would be my best bet if one could be found.  However, I am saying
>>> this without having done the profiling of the calculation myself so
>>> that is still a guess.
>>>
>>> * Bates and DebRoy, "Linear mixed models and penalized least squares",
>>> Journal of Multivariate Analysis, 91 (2004) 1-17
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>