[Bioc-devel] parLapplyLB: Load balancing?

Ulrich Bodenhofer bodenhofer at bioinf.jku.at
Fri Jul 19 12:20:46 CEST 2013


[cross-posted on R-devel and Bioc-devel, since the functions from the 
parallel package discussed here are mirrored in the BiocGenerics package]

Hi,

I am currently running a lengthy simulation study (no details necessary) 
on a large multi-core system. The simulated data sets are stored in a 
long list and they are unevenly sized (hence, the computation times vary 
greatly between data sets), so I naively thought that parLapplyLB() 
would be the right function to parallelize my computations. Presently, 
only two thirds of my computations are finished, however, only 16 of my 
72 workers seem to be active, whereas the remaining 56 seem to run idle. 
I tried to find out in more detail what parLapplyLB() actually does and 
I was astonished to see that, in the same way as parLapply(), it splits 
the input list X into as many parts as there are nodes in the cluster 
and then calls clusterApplyLB() on these chunks. By the way, parLapply() 
does exactly the same, but then calls clusterApply(). I may miss a 
point, but this means to me, since there are as many chunks as nodes in 
the cluster, that actually no load balancing takes place, right? My list 
of data sets are actually ordered by size, that's why the small ones are 
grouped together and the large ones are grouped together - with the 
obvious effect that the workers with the chunks or small data sets have 
finished long ago while the workers with the longest data sets are still 
not finished and will take quite some more time to complete.

First question: is this behavior of parLapplyLB() really intended? As it 
seems to me, clusterApplyLB() is on the other side of the spectrum, 
since it distributes every single data item dynamically and, therefore, 
supposedly creates quite some communication overhead. Why not adding an 
argument to parLapplyLB() that controls the size or number of chunks the 
input list X is split into? Additionally (or alternatively), 
communication overhead can be reduced by sending all additional 
arguments to the workers once and for all (a trick I have seen on page 
20 of "Parallel R" by QE McCallum). Here is my version of it (the only 
difference to McCallum's code is that I moved LB.init() and LB.worker() 
to the inside of parLapplyLB2()):


    parLapplyLB2 <- function(cl, x, fun, ...)
    {
         LB.init <- function(fun, ...)
         {
             assign(".LB.fun", fun, pos=globalenv())
             assign(".LB.args", list(...), pos=globalenv())
             NULL
         }

         LB.worker <- function(x) do.call(".LB.fun", c(list(x), .LB.args))

         clusterCall(cl, LB.init, fun, ...)
         r <- clusterApplyLB(cl, x, LB.worker)
         clusterEvalQ(cl, rm(".LB.fun", ".LB.args", pos=globalenv()))
         r
    }


The issue described above motivated me to learn more about how load 
balancing is done with R's parallel package. I do not want to bother you 
with details, but I can provide my code and results upon request. My 
result is that clusterApply(), clusterApplyLB(), and parLapplyLB2() from 
above (which is essentially a wrapper around clusterApplyLB()) are 
equally bad if the tasks are ordered ascendingly according to the time 
they take. In any case, however, parLapply() and parLapplyLB() are even 
worse (because of the grouping effect highlighted above). If nothing is 
known about how long the tasks take (or if the tasks are in random 
order), clusterLapplyLB() and parLapplyLB2() provide quite good results, 
but still not a linear speedup. If the tasks are ordered descendingly in 
terms of the time they take, clusterLapplyLB() and parLapplyLB2() 
provide the best results because all workers start with the tasks that 
take longest and then, with increasing time, shorter tasks can be 
distributed more evenly over idle workers. I suppose this is not 
surprising and probably common knowledge. I just wanted to make the 
point that also empirical evaluations confirmed that parLapply() and 
parLapplyLB() perform equally bad when the lengths of tasks vary. In the 
case of parLapply(), this is to be expected. In the case of 
parLapplyLB(), this is probably what users would not expect - given the 
misleading "LB" in the name of the function.

Regards,
Ulrich


------------------------------------------------------------------------
*Dr. Ulrich Bodenhofer*
Associate Professor
Institute of Bioinformatics

*Johannes Kepler University*
Altenberger Str. 69
4040 Linz, Austria

Tel. +43 732 2468 4526
Fax +43 732 2468 4539
bodenhofer at bioinf.jku.at <mailto:bodenhofer at bioinf.jku.at>
http://www.bioinf.jku.at/ <http://www.bioinf.jku.at>



More information about the Bioc-devel mailing list