[R] Problem parallelizing across cores

James Spottiswoode j@me@ @end|ng |rom j@@@oc@com
Thu Aug 29 01:55:14 CEST 2019



> On Aug 28, 2019, at 4:44 PM, James Spottiswoode <james using jsasoc.com> wrote:
> 
> Hi Bert,
> 
> Thanks for your advice.  Actually i’ve already done this and have checked out doParallel and future packages.  The trouble with doParallel is that it forks R processes which spend a lot of time loading data and packages whereas my function runs in 100ms so the parallelization doesn’t help.  The future package keeps it’s children running but I haven’t figured out how to get it to work in my application.
> 
> Best — James
> 
> 
>> On Aug 28, 2019, at 3:39 PM, Bert Gunter <bgunter.4567 using gmail.com <mailto:bgunter.4567 using gmail.com>> wrote:
>> 
>> 
>> I would suggest that that you search on "parallel computing" at the Rseek.org <http://rseek.org/> site. This brought up what seemed to be many relevant hits including, of course, the High Performance and parallel Computing Cran task view.
>> 
>> Cheers,
>> Bert
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Wed, Aug 28, 2019 at 3:18 PM James Spottiswoode <james.spottiswoode using gmail.com <mailto:james.spottiswoode using gmail.com>> wrote:
>> Hi All,
>> 
>> I have a piece of well optimized R code for doing text analysis running
>> under Linux on an AWS instance.  The code first loads a number of packages
>> and some needed data and the actual analysis is done by a function called,
>> say, f(string).  I would like to parallelize calling this function across
>> the 8 cores of the instance to increase throughput.  I have looked at the
>> packages doParallel and future but am not clear how to do this.  Any method
>> that brings up an R instance when the function is called will not work for
>> me as the time to load the packages and data is comparable to the execution
>> time of the function leading to no speed up.  Therefore I need to keep a
>> number of instances of the R code running continuously so that the data
>> loading only occurs once when the R processes are first started and
>> thereafter the function f(string) is ready to run in each instance.  I hope
>> I have put this clearly.
>> 
>> I’d much appreciate any suggestions.  Thanks in advance,
>> 
>> James Spottiswoode
>> 
>> 
>> --
>> 
>>         [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help using r-project.org <mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
> 
> James Spottiswoode
> Applied Mathematics & Statistics
> (310) 270 6220
> jamesspottiswoode Skype
> james using jsasoc.com <mailto:james using jsasoc.com>

James Spottiswoode
Applied Mathematics & Statistics
(310) 270 6220
jamesspottiswoode Skype
james using jsasoc.com


	[[alternative HTML version deleted]]



More information about the R-help mailing list