[Bioc-devel] parLapplyLB: Load balancing?
Ulrich Bodenhofer
bodenhofer at bioinf.jku.at
Fri Jul 19 12:20:46 CEST 2013
[cross-posted on R-devel and Bioc-devel, since the functions from the
parallel package discussed here are mirrored in the BiocGenerics package]
Hi,
I am currently running a lengthy simulation study (no details necessary)
on a large multi-core system. The simulated data sets are stored in a
long list and they are unevenly sized (hence, the computation times vary
greatly between data sets), so I naively thought that parLapplyLB()
would be the right function to parallelize my computations. Presently,
only two thirds of my computations are finished, however, only 16 of my
72 workers seem to be active, whereas the remaining 56 seem to run idle.
I tried to find out in more detail what parLapplyLB() actually does and
I was astonished to see that, in the same way as parLapply(), it splits
the input list X into as many parts as there are nodes in the cluster
and then calls clusterApplyLB() on these chunks. By the way, parLapply()
does exactly the same, but then calls clusterApply(). I may miss a
point, but this means to me, since there are as many chunks as nodes in
the cluster, that actually no load balancing takes place, right? My list
of data sets are actually ordered by size, that's why the small ones are
grouped together and the large ones are grouped together - with the
obvious effect that the workers with the chunks or small data sets have
finished long ago while the workers with the longest data sets are still
not finished and will take quite some more time to complete.
First question: is this behavior of parLapplyLB() really intended? As it
seems to me, clusterApplyLB() is on the other side of the spectrum,
since it distributes every single data item dynamically and, therefore,
supposedly creates quite some communication overhead. Why not adding an
argument to parLapplyLB() that controls the size or number of chunks the
input list X is split into? Additionally (or alternatively),
communication overhead can be reduced by sending all additional
arguments to the workers once and for all (a trick I have seen on page
20 of "Parallel R" by QE McCallum). Here is my version of it (the only
difference to McCallum's code is that I moved LB.init() and LB.worker()
to the inside of parLapplyLB2()):
parLapplyLB2 <- function(cl, x, fun, ...)
{
LB.init <- function(fun, ...)
{
assign(".LB.fun", fun, pos=globalenv())
assign(".LB.args", list(...), pos=globalenv())
NULL
}
LB.worker <- function(x) do.call(".LB.fun", c(list(x), .LB.args))
clusterCall(cl, LB.init, fun, ...)
r <- clusterApplyLB(cl, x, LB.worker)
clusterEvalQ(cl, rm(".LB.fun", ".LB.args", pos=globalenv()))
r
}
The issue described above motivated me to learn more about how load
balancing is done with R's parallel package. I do not want to bother you
with details, but I can provide my code and results upon request. My
result is that clusterApply(), clusterApplyLB(), and parLapplyLB2() from
above (which is essentially a wrapper around clusterApplyLB()) are
equally bad if the tasks are ordered ascendingly according to the time
they take. In any case, however, parLapply() and parLapplyLB() are even
worse (because of the grouping effect highlighted above). If nothing is
known about how long the tasks take (or if the tasks are in random
order), clusterLapplyLB() and parLapplyLB2() provide quite good results,
but still not a linear speedup. If the tasks are ordered descendingly in
terms of the time they take, clusterLapplyLB() and parLapplyLB2()
provide the best results because all workers start with the tasks that
take longest and then, with increasing time, shorter tasks can be
distributed more evenly over idle workers. I suppose this is not
surprising and probably common knowledge. I just wanted to make the
point that also empirical evaluations confirmed that parLapply() and
parLapplyLB() perform equally bad when the lengths of tasks vary. In the
case of parLapply(), this is to be expected. In the case of
parLapplyLB(), this is probably what users would not expect - given the
misleading "LB" in the name of the function.
Regards,
Ulrich
------------------------------------------------------------------------
*Dr. Ulrich Bodenhofer*
Associate Professor
Institute of Bioinformatics
*Johannes Kepler University*
Altenberger Str. 69
4040 Linz, Austria
Tel. +43 732 2468 4526
Fax +43 732 2468 4539
bodenhofer at bioinf.jku.at <mailto:bodenhofer at bioinf.jku.at>
http://www.bioinf.jku.at/ <http://www.bioinf.jku.at>
More information about the Bioc-devel
mailing list