[R-sig-hpc] "chunking" parallel tasks

Martin Morgan mtmorgan at fhcrc.org
Tue Jan 26 18:54:16 CET 2010

On 01/26/2010 09:51 AM, Norm Matloff wrote:
> There are some very sophisticated strategies for chunking out there in
> the parallel processing world, aimed at dealing with the load balancing
> problem.  The idea is to use very large chunks at the beginning and
> middle stages of the computation, to minimize communication overhead and
> the like, but then switch to smaller ones near the end in order to keep
> all the processes busy.  For example, the "guided" option in OpenMP does
> this.
> However, in my experience, it is seldom necessary to resort to this, as
> load balancing is typically not a problem.  One can even show this
> theoretically:  Say task times are T[1], T[2], etc., and are i.i.d.  The
> standard deviation of sum(T[1:k]), divided by the mean, goes to 0 as k
> goes to infinity--i.e. the sum is essentially constant.  Of course, that
> is an idealized model, but again in practice I have found that load
> balancing is not much of an issue.
> For that reason and because of the communication overhead, in most cases
> it is actually faster to statically assign 1/p of the tasks to each of
> the p processes, i.e. not do chunking.

Agreed, and for the matrix operations Mark hinted at one likely gets the
most benefit by using R's vectorized matrix operations (rather than
*apply), then by using vectorized blas/lapack libraries, then multicore.
Probably distributing tasks across nodes in a cluster (e.g., mpi) for
matrix operations is almost always a losing proposition -- communication
costs are just too high.

> Norm Matloff
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

More information about the R-sig-hpc mailing list