[R-sig-hpc] "chunking" parallel tasks

Tue Jan 26 19:33:24 CET 2010

>> duplicated task, invalidate the latter and kill it.
should be 'duplicated chunk'

On Tue, Jan 26, 2010 at 1:32 PM, Saptarshi Guha
<saptarshi.guha at gmail.com> wrote:
> One other approach, were the computation per chunk runs into
> several(tens of ) minutes, is to monitor the running time of long
> running tasks(each working on a chunk), if greater than a cut off,
> split the chunk and assign to unused (or lesser loaded) machines. If a
> task for a particular chunk finishes earlier than some task for a
> duplicated task, invalidate the latter and kill it.
> Of course, the run time for a chunk should be greater(much) than the
> cost of duplicating a chunk, reading it in and starting new tasks. To
> implement this, one would have write a system which actually monitors
> the running time tasks, the chunking and duplication.
>
> I think Hadoop Mapreduce does something similar, though it is most
> certainly not the best tool for some tasks.
>
> Regards
> Saptarshi
>
> On Tue, Jan 26, 2010 at 12:54 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> On 01/26/2010 09:51 AM, Norm Matloff wrote:
>>> There are some very sophisticated strategies for chunking out there in
>>> the parallel processing world, aimed at dealing with the load balancing
>>> problem.  The idea is to use very large chunks at the beginning and
>>> middle stages of the computation, to minimize communication overhead and
>>> the like, but then switch to smaller ones near the end in order to keep
>>> all the processes busy.  For example, the "guided" option in OpenMP does
>>> this.
>>>
>>> However, in my experience, it is seldom necessary to resort to this, as
>>> load balancing is typically not a problem.  One can even show this
>>> theoretically:  Say task times are T[1], T[2], etc., and are i.i.d.  The
>>> standard deviation of sum(T[1:k]), divided by the mean, goes to 0 as k
>>> goes to infinity--i.e. the sum is essentially constant.  Of course, that
>>> is an idealized model, but again in practice I have found that load
>>> balancing is not much of an issue.
>>>
>>> For that reason and because of the communication overhead, in most cases
>>> it is actually faster to statically assign 1/p of the tasks to each of
>>> the p processes, i.e. not do chunking.
>>
>> Agreed, and for the matrix operations Mark hinted at one likely gets the
>> most benefit by using R's vectorized matrix operations (rather than
>> *apply), then by using vectorized blas/lapack libraries, then multicore.
>> Probably distributing tasks across nodes in a cluster (e.g., mpi) for
>> matrix operations is almost always a losing proposition -- communication
>> costs are just too high.
>>
>> Martin
>>>
>>> Norm Matloff
>>>
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>