[R-sig-hpc] issue with using R parallelization libraries

Saptarshi Guha saptarshi.guha at gmail.com
Sat Oct 17 03:05:29 CEST 2009


Actually, it would be nice ( only if possible) if you could supply the
simulation code. I would like to test this on some lab machines
Regards
Saptarshi


On Fri, Oct 16, 2009 at 5:59 PM, Norm Matloff <matloff at cs.ucdavis.edu> wrote:
> Interesting.  The unfortunate nature of parallel programming is that one
> library/hardware platform will not work well for all applications.
>
> I'm about to release to CRAN the official version of my Rdsm
> shared-memory parallel package, for which I released an alpha version
> (not on CRAN) a couple of months ago.  The new version is much faster
> than the old one.  If you are interested, I'd like to test your code on
> Rdsm.  Same for others who may have some problematic code.  I'm not
> saying Rdsm will be faster (unlikely), but it would be interesting to
> see how it does.
>
> Norm Matloff
>
> On Fri, Oct 16, 2009 at 04:47:49PM -0400, Glenn Blanford wrote:
>> Looking for advice on parallel techniques for R.
>>
>> I am presently working with parallelizing R code from a package we use locally for analysis.
>> The multicore library routines parallel(), and mclapply() give mixed results.
>> For starters, 2 to 8 cores (1 processor) are available only.
>>
>> Setting up an interative for(i=1,N) loop with a function ff() where ff() accepts some matrixes and does some number crunching,
>> parallel() creates as many R threads as N and leaves them around until collect() reads the pids, gets the returned data values and deposits them.  A cleanup of the threads ensues.
>> mclapply() creates and deletes R threads dynamically only as many as #cores, returning values which continually build inside the new vector.
>> With large values of N, parallel() causes problems as there are too many threads to manage and it cant pipeline only a few at a time, so all N are running in parallel, instead of #cores or something in between, - if N = 500, the system slows to a where it has to be rebooted basically.
>> With large values of N, mclapply() runs to completion ok, but it drags out the amount of system time (from system.time()) needed since it is continually managing thread create/kill/data exchange, so again its not efficient.  Total elapsed time with mclapply() becomes more not less.
>>
>> Does anyone know how to get around this (without changing a huge amount of R code) ?
>> I have started looking at snowfall, Rparallel, etc to see if there are better ways of managing the threads, but if anyone on the list has had some experience with this, it would be great to learn how to throttle the thread management or other way of speeding things up.  Thanks greatly.
>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE:  This email and any attachments ...{{dropped:26}}
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list