[R] No speed up using the parallel package and ncpus > 1 with boot() on linux machines
Chris Evans
chrishold at psyctc.org
Sun Oct 18 11:31:13 CEST 2015
As with Milan's answer: perfect explanation and hugely appreciated. A few follow up questions/comments below.
----- Original Message -----
> From: "Jeff Newmiller" <jdnewmil at dcn.davis.ca.us>
> To: "Chris Evans" <chrishold at psyctc.org>
> Cc: r-help at r-project.org
> Sent: Saturday, 17 October, 2015 18:28:12
> Subject: Re: [R] No speed up using the parallel package and ncpus > 1 with boot() on linux machines
> None of this is surprising. If the calculations you divide your work up
> into are small, then the overhead of communicating between parallel
> processes will be a relatively large penalty to pay. You have to break
> your problem up into larger chunks and depend on vector processing within
> processes to keep the cpu busy doing useful work.
Aha. Got it!
> Also, I am not aware of any model of Mac Mini that has 8 physical cores...
> 4 is the max. Virtual cores gain a logical simplification of
> multiprocessing but do not offer actual improved performance because
> there are only as many physical data paths and registers as there are
> cores.
Ah. Hadn't thought of that. It's a machine I rent, I thought it was a mac mini. detectCores() reports 8 but perhaps they are virtual cores. /proc/cpuinfo says the processor is an Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz and shows 8 cores but again ... perhaps they are virtual. What's the best way to get a true core count?
> Note that your problems are with long-running simulations... your examples
> are too small to demonstrate the actual balance of processing vs.
> communication overhead. Before you draw conclusions, try upping bootReps
> by a few orders of magnitude, and run your test code a couple
> of times to stabilize the memory conditions and obtain some consistency
> in timings.
OK. Good advice again but what you are saying, and the findings I had there, are pretty consistent with what I was seeing with long running things with bootReps up at 10k and I think you've told me what I really want to know. I think the simplest way to parallelise may actually be fine for me: I'll run four (or maybe eight) separate R jobs (having a look at swapping to make sure I'm not pushing beyond physical RAM, don't think these simulations will.
> I have never used the parallel option in the boot package before... I have
> always rolled my own to allow me to decide how much work to do within the
> worker processes before returning from them. (This is particularly severe
> when using snow, but not necessarily something you can neglect with
> multicore.)
That sounds like an impressive and obviously pertinent approach. I think, as I say, I may be able to get away with a very simple approach that runs parallel simulations and then aggregates the data from each and analyses that.
Many thanks Jeff. Brilliant help.
Chris
> On Sat, 17 Oct 2015, Chris Evans wrote:
>
>> I think I am failing to understand how boot() uses the parallel package on linux
... rest of my original post deleted to save space ...
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
More information about the R-help
mailing list