[R-sig-ME] multicore lmer: any changes since 2007?

Wed Jul 6 20:07:01 CEST 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/06/2011 01:51 PM, Douglas Bates wrote:
> On Wed, Jul 6, 2011 at 12:40 PM, Ben Bolker <bbolker at gmail.com> wrote:
>> On 07/06/2011 11:40 AM, Mike Lawrence wrote:
>>> Thanks for the detailed reply.
>>>
>>> According to the comment by Joshua Ulrich (one of the DEoptim
>>> developers) on this stackoverflow post
>>> (http://stackoverflow.com/questions/3759878/parallel-optimization-in-r),
>>> it seems that DEoptim might be a parallel-izable optimizer, and as I
>>> recall you can put box constraints on parameters with DEoptim. I just
>>> sent off an email to the DEoptim developers to see if there's been any
>>> progress on the parallel front.
>>>
>>> Mike
>>
>>  I have used differential evolution in the past (although not in
>> decades (!!)), although not the DEoptim() package, but I don't think
>> DEoptim() will be appropriate for this purpose.  I'm actually not
>> entirely clear on what DB means by "parallel evaluation of the objective
>> function".  In the simplest derivative-free case for example (the
>> Nelder-Mead simplex), it's hard to see how one could evaluate the
>> objective function in parallel because each evaluation changes the
>> structure of the simplex and determines where the next evaluation should
>> be.  A very quick look at BOBYQA (source code in the minqa package, or
>> formal description at
>> <http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf>) suggests
>> the same one-point-at-a-time updating scheme.
>>
>>  But DB says
>>
>>>> In many cases you know several points where you will be
>>>> evaluating the objective so you could split those
>>>> off into different threads.
>>
>>  Since he has (probably literally) forgotten more about numerical
>> computation than I ever knew, he's probably right, but I don't know of
>> those examples.
>>
>>  Interesting discussion ...
> 
> You have done a better job of checking on the veracity of my
> statements than I did.  I was thinking of cases where the gradient is
> approximated by finite differences.  In many cases the number of
> parameters in the optimization in lmer is small and you could do the
> finite difference calculations in parallel.  However, you have looked
> at the code, which is more than I did, and I will accept that neither
> of the methods you mention permit you to spawn off multiple function
> evaluations in parallel.
> 
> Regarding the differential evolution optimization, i agree with you
> that it may not be best for this job.  Some perfunctory testing with
> Dirk Eddelbuettel's implementation in the RcppDE package indicated
> that it would end up taking longer than a classical optimizer based on
> a local quadratic approximation.  I think DE is intended for more
> difficult objective functions than those presented by lmer.

  PS.  Another possible avenue (I am not volunteering!!) would be to see
whether the forward-mode automatic differentiation code written for
Google summer of code 201  <https://github.com/quantumelixir/radx> would
actually work for the derivatives in lmer ...

>>> On Wed, Jul 6, 2011 at 11:59 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
>>>> On Tue, Jul 5, 2011 at 4:52 PM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
>>>>> Back in 2007 (http://tolstoy.newcastle.edu.au/R/e2/help/07/05/17777.html)
>>>>> Dr. Bates suggested that using a multithreaded BLAS was the only
>>>>> option for speeding lmer computations on multicore machines (and even
>>>>> then, it might even cause a slow down under some circumstances).
>>>>>
>>>>> Is this advice still current, or have other means of speeding lmer
>>>>> computations on multicore machines arisen in more recent years?
>>>>
>>>> As always, the problem with trying to parallelize a particular
>>>> calculation is to determine how and when to start more than one
>>>> thread.
>>>>
>>>> After the setup stage the calculations in fitting an lmer model
>>>> involve optimizing the profiled deviance or profiled REML criterion.
>>>> Each evaluation of the criterion involves updating the components of
>>>> the relative covariance factor, updating the sparse Cholesky
>>>> decomposition and solving a couple of systems of equations involving
>>>> the sparse Cholesky factor.
>>>>
>>>> There are a couple of calculations involving dense matrices but in
>>>> most cases the time spent on them is negligible relative to the
>>>> calculations involving the sparse matrices.
>>>>
>>>> A multithreaded BLAS will only help speed up the calculations on dense
>>>> matrices.  The "supernodal" form of the Cholesky factorization can use
>>>> the BLAS for some calculations but usually on small blocks.  Most of
>>>> the time the software chooses the "simplicial" form of the
>>>> factorization because the supernodal form would not be efficient and
>>>> the simplicial form doesn't use the BLAS at all.  Even if the
>>>> supernodal form is chosen, the block sizes are usually small and a
>>>> multithreaded BLAS can actually slow down operations on small blocks
>>>> because the communication and synchronization overhead cancels out any
>>>> gain from using multiple cores.
>>>>
>>>> Of course, your mileage may vary and only by profiling both the R code
>>>> and the compiled code will you be able to determine how things could
>>>> be sped up.
>>>>
>>>> If I had to guess, I would say that the best hope for parallelizing
>>>> the computation would be to find an optimizer that allows for parallel
>>>> evaluation of the objective function.  The lme4 package requires
>>>> optimization of a nonlinear objective subject to "box constraints"
>>>> (meaning that some of the parameters can have upper and/or lower
>>>> bounds).  Actually it is simpler than that, some of the parameters
>>>> must be positive.  We do not provide gradient evaluations.  I once*
>>>> worked out the gradient of the criterion (I think it was the best
>>>> mathematics I ever did) and then found that it ended up slowing the
>>>> optimization to a crawl in the difficult cases.  A bit of reflection
>>>> showed that each evaluation of the gradient could be hundreds or
>>>> thousands of times more complex than an evaluation of the objective
>>>> itself so you might as well use a gradient free method and just do
>>>> more function evaluations.  In many cases you know several points
>>>> where you will be evaluating the objective so you could split those
>>>> off into different threads.
>>>>
>>>> I don't know of such a mulithreaded optimizer (many of the optimizers
>>>> that I find are still being written in Fortran 77, God help us) but
>>>> that would be my best bet if one could be found.  However, I am saying
>>>> this without having done the profiling of the calculation myself so
>>>> that is still a guess.
>>>>
>>>> * Bates and DebRoy, "Linear mixed models and penalized least squares",
>>>> Journal of Multivariate Analysis, 91 (2004) 1-17
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4UpEUACgkQc5UpGjwzenMpAwCeMmv8b9WjQreQ/T45G5lAplwP
94MAn08kkEAxtWqwKI8BxxrS27Jr7BQ9
=/CcL
-----END PGP SIGNATURE-----