[R-sig-hpc] "chunking" parallel tasks
saptarshi.guha at gmail.com
Tue Jan 26 19:33:24 CET 2010
>> duplicated task, invalidate the latter and kill it.
should be 'duplicated chunk'
On Tue, Jan 26, 2010 at 1:32 PM, Saptarshi Guha
<saptarshi.guha at gmail.com> wrote:
> One other approach, were the computation per chunk runs into
> several(tens of ) minutes, is to monitor the running time of long
> running tasks(each working on a chunk), if greater than a cut off,
> split the chunk and assign to unused (or lesser loaded) machines. If a
> task for a particular chunk finishes earlier than some task for a
> duplicated task, invalidate the latter and kill it.
> Of course, the run time for a chunk should be greater(much) than the
> cost of duplicating a chunk, reading it in and starting new tasks. To
> implement this, one would have write a system which actually monitors
> the running time tasks, the chunking and duplication.
> I think Hadoop Mapreduce does something similar, though it is most
> certainly not the best tool for some tasks.
> On Tue, Jan 26, 2010 at 12:54 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> On 01/26/2010 09:51 AM, Norm Matloff wrote:
>>> There are some very sophisticated strategies for chunking out there in
>>> the parallel processing world, aimed at dealing with the load balancing
>>> problem. The idea is to use very large chunks at the beginning and
>>> middle stages of the computation, to minimize communication overhead and
>>> the like, but then switch to smaller ones near the end in order to keep
>>> all the processes busy. For example, the "guided" option in OpenMP does
>>> However, in my experience, it is seldom necessary to resort to this, as
>>> load balancing is typically not a problem. One can even show this
>>> theoretically: Say task times are T, T, etc., and are i.i.d. The
>>> standard deviation of sum(T[1:k]), divided by the mean, goes to 0 as k
>>> goes to infinity--i.e. the sum is essentially constant. Of course, that
>>> is an idealized model, but again in practice I have found that load
>>> balancing is not much of an issue.
>>> For that reason and because of the communication overhead, in most cases
>>> it is actually faster to statically assign 1/p of the tasks to each of
>>> the p processes, i.e. not do chunking.
>> Agreed, and for the matrix operations Mark hinted at one likely gets the
>> most benefit by using R's vectorized matrix operations (rather than
>> *apply), then by using vectorized blas/lapack libraries, then multicore.
>> Probably distributing tasks across nodes in a cluster (e.g., mpi) for
>> matrix operations is almost always a losing proposition -- communication
>> costs are just too high.
>>> Norm Matloff
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
More information about the R-sig-hpc