[R-sig-ME] multicore lmer: any changes since 2007?

Wed Jul 6 19:51:55 CEST 2011

> From: r-sig-mixed-models-bounces at r-project.org 
> [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf 
> Of Ben Bolker
> Sent: Wednesday, July 06, 2011 1:41 PM
> To: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] multicore lmer: any changes since 2007?
> 
> On 07/06/2011 11:40 AM, Mike Lawrence wrote:
> > Thanks for the detailed reply.
> > 
> > According to the comment by Joshua Ulrich (one of the DEoptim
> > developers) on this stackoverflow post
> > 
> (http://stackoverflow.com/questions/3759878/parallel-optimizat
> ion-in-r),
> > it seems that DEoptim might be a parallel-izable optimizer, and as I
> > recall you can put box constraints on parameters with 
> DEoptim. I just
> > sent off an email to the DEoptim developers to see if 
> there's been any
> > progress on the parallel front.
> > 
> > Mike
> 
>   I have used differential evolution in the past (although not in
> decades (!!)), although not the DEoptim() package, but I don't think
> DEoptim() will be appropriate for this purpose.  I'm actually not
> entirely clear on what DB means by "parallel evaluation of 
> the objective
> function".  In the simplest derivative-free case for example (the
> Nelder-Mead simplex), it's hard to see how one could evaluate the
> objective function in parallel because each evaluation changes the
> structure of the simplex and determines where the next 
> evaluation should
> be.  

My recollection of how NM simplex works is fuzzy, but isn't it the case that at each iteration you try to "flip" the simplex in all p directions and choose the one with the steepest decend?  If so, that's p points you need to evaluate the objective function.  Or is that only needed at the beginning?

Andy

> A very quick look at BOBYQA (source code in the minqa package, or
> formal description at
> <http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf>) suggests
> the same one-point-at-a-time updating scheme.
> 
>   But DB says
> 
> >> In many cases you know several points where you will be
> >> evaluating the objective so you could split those
> >> off into different threads.
> 
>   Since he has (probably literally) forgotten more about numerical
> computation than I ever knew, he's probably right, but I don't know of
> those examples.
> 
>   Interesting discussion ...
> 
> 
> > 
> > On Wed, Jul 6, 2011 at 11:59 AM, Douglas Bates 
> <bates at stat.wisc.edu> wrote:
> >> On Tue, Jul 5, 2011 at 4:52 PM, Mike Lawrence 
> <Mike.Lawrence at dal.ca> wrote:
> >>> Back in 2007 
> (http://tolstoy.newcastle.edu.au/R/e2/help/07/05/17777.html)
> >>> Dr. Bates suggested that using a multithreaded BLAS was the only
> >>> option for speeding lmer computations on multicore 
> machines (and even
> >>> then, it might even cause a slow down under some circumstances).
> >>>
> >>> Is this advice still current, or have other means of speeding lmer
> >>> computations on multicore machines arisen in more recent years?
> >>
> >> As always, the problem with trying to parallelize a particular
> >> calculation is to determine how and when to start more than one
> >> thread.
> >>
> >> After the setup stage the calculations in fitting an lmer model
> >> involve optimizing the profiled deviance or profiled REML 
> criterion.
> >> Each evaluation of the criterion involves updating the 
> components of
> >> the relative covariance factor, updating the sparse Cholesky
> >> decomposition and solving a couple of systems of equations 
> involving
> >> the sparse Cholesky factor.
> >>
> >> There are a couple of calculations involving dense matrices but in
> >> most cases the time spent on them is negligible relative to the
> >> calculations involving the sparse matrices.
> >>
> >> A multithreaded BLAS will only help speed up the 
> calculations on dense
> >> matrices.  The "supernodal" form of the Cholesky 
> factorization can use
> >> the BLAS for some calculations but usually on small 
> blocks.  Most of
> >> the time the software chooses the "simplicial" form of the
> >> factorization because the supernodal form would not be 
> efficient and
> >> the simplicial form doesn't use the BLAS at all.  Even if the
> >> supernodal form is chosen, the block sizes are usually small and a
> >> multithreaded BLAS can actually slow down operations on 
> small blocks
> >> because the communication and synchronization overhead 
> cancels out any
> >> gain from using multiple cores.
> >>
> >> Of course, your mileage may vary and only by profiling 
> both the R code
> >> and the compiled code will you be able to determine how 
> things could
> >> be sped up.
> >>
> >> If I had to guess, I would say that the best hope for parallelizing
> >> the computation would be to find an optimizer that allows 
> for parallel
> >> evaluation of the objective function.  The lme4 package requires
> >> optimization of a nonlinear objective subject to "box constraints"
> >> (meaning that some of the parameters can have upper and/or lower
> >> bounds).  Actually it is simpler than that, some of the parameters
> >> must be positive.  We do not provide gradient evaluations.  I once*
> >> worked out the gradient of the criterion (I think it was the best
> >> mathematics I ever did) and then found that it ended up slowing the
> >> optimization to a crawl in the difficult cases.  A bit of 
> reflection
> >> showed that each evaluation of the gradient could be hundreds or
> >> thousands of times more complex than an evaluation of the objective
> >> itself so you might as well use a gradient free method and just do
> >> more function evaluations.  In many cases you know several points
> >> where you will be evaluating the objective so you could split those
> >> off into different threads.
> >>
> >> I don't know of such a mulithreaded optimizer (many of the 
> optimizers
> >> that I find are still being written in Fortran 77, God help us) but
> >> that would be my best bet if one could be found.  However, 
> I am saying
> >> this without having done the profiling of the calculation myself so
> >> that is still a guess.
> >>
> >> * Bates and DebRoy, "Linear mixed models and penalized 
> least squares",
> >> Journal of Multivariate Analysis, 91 (2004) 1-17
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> > 
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}