[R-sig-hpc] Why pure computation time in parallel is longer than the serial version?

Roger Bivand Roger.Bivand at nhh.no
Sun Feb 23 14:03:20 CET 2014


On Sun, 23 Feb 2014, Xuening Zhu wrote:

> Hi Roger:
> Your explanation is very reasonable and helpful here. Actually the
> original(default) number of  BLAS threads is 2 on my computer. I just turn
> it to 4 threads by hand later for comparison.
>
> There is another question. I'm also confused that in my experiment 
> *mclapply(2 cores)+BLAS(single thread)* is *faster* than *BLAS(2 
> threads) *and also faster than* mclapply(2 cores)*. That means the 
> combination does make sense here.  I know mclapply employs "fork" in 
> POSIX system. But what is the difference? What makes the combination 
> faster than each element of them?

Your alternatives are mclapply(2 cores) with the fast BLAS running 
sequentially on each core (fastest), fast BLAS running in parallel on 2 
cores, and mclapply(2 cores) with standard BLAS running sequentially on 
each core.

For your problem and on your hardware (the size of the level 2 cache 
affected the size of problem chunks tuned/fast BLAS can compute in 
parallel before unified cache), mclapply (forking the process) and 
sequential (single core) fast BLAS are best. For cache see:

http://en.wikipedia.org/wiki/CPU_cache#Multi-level_caches

It looks as though "modern" processors with unified cache may not be great 
for numerical work unless the L3 unified cache is relatively large, but 
I'm just speculating, maybe someone knows?

You could change the size of the problem and see if your conclusions 
change. Read up on the difference between forking and starting new 
processes (it's among other things about memory). These things do vary 
from hardware to hardware and task to task.

Hope this helps,

Roger

>
> Thanks all of you. Have a nice day~
>
> Xuening
>
>
> 2014-02-23 2:33 GMT+08:00 Roger Bivand <Roger.Bivand at nhh.no>:
>
>> On Sat, 22 Feb 2014, beleites,claudia wrote:
>>
>>  Hi Xuening,
>>>
>>> 2 physical vs 2 physical * 2 logical threads: See e.g. here:
>>> http://unix.stackexchange.com/a/88290
>>>
>>> You say you have 2 *physical* cores. That's the number you want to use
>>> for the parallel execution. Logical cores are just 2 (or more) threads
>>> running on the same physical core. IIRC, this can speed up things mainly if
>>> the 2 threads run very different operations.
>>>
>>
>> Yes, this is my experience - I turn off Intel hyperthreads in BIOS to
>> prevent software getting confused. BLAS sees available compute resources,
>> so your BLAS may be installed to see 4 cores, but doesn't know that two are
>> hyperthreads and compete for physical resources. It may be that by limiting
>> BLAS to 2, it gets privileged access to the two real cores, and other OS
>> (or other) tasks running at the same time use the hyperthreads.
>>
>> Roger
>>
>>
>>> --
> Xuening Zhu
> --------------------------------------------------------
> Master of Business Statistics
> Guanghua School of Management, Peking University
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-hpc mailing list