[R-SIG-Mac] [R-sig-hpc] Grand Central Dispatch (simple loop optimization)

Simon Urbanek simon.urbanek at r-project.org
Thu Sep 17 22:49:02 CEST 2009


Jan,

On Sep 17, 2009, at 16:16 , Jan de Leeuw wrote:

> on my system (2 x 2.93 quad core Nehalem
> with hyper-threading, so 16 threads max, 16GB RAM,
> 10.6.1, 64bit kernel, 64bit R)
>
> > system.time(threads(100000,1000,"omp"))
>   user  system elapsed
> 10.249   0.009   0.662
> > system.time(threads(100000,1000,"gcd"))
>   user  system elapsed
> 10.208   0.008   0.668
> > system.time(threads(100000,1000,"dcg"))
>   user  system elapsed
>  8.731   0.005   8.738
>
> so omp == gcd, but for more complicated tasks the
> tighter integration may favor gcd
>
> comparing harpertown and nehalem --> surprising
> difference (kernel ? hyper-threading ?)
>

Interesting but consistent with my observations so far - Nehalems are  
not any faster than equally clocked Harpertowns (see dcg time). The  
only gains are in HT as seen in your example - my Harpertown has 4  
logical cpus, yours has 16. My 2.26GHz Nehalem is running Leopard  
(because it's the build machine ;)) but the results are similar:

 > system.time(threads(100000,1000,"omp_try"))
    user  system elapsed
  12.924   0.031   0.852
 > system.time(threads(100000,1000,"dcg_try"))
    user  system elapsed
  11.595   0.009  11.608

Again, the sequential time is about the same as on equally clocked  
Harpertown, but the HT helps with a factor of over 13. That explains  
where the alleged performance boost on Nehalems comes from ...

It would be interesting to run OMP pnmath with schedule(dynamic) on a  
8-core Nehalem and compare that with a stock R ... (pnmath will need a  
bit of tweaking because it attempts to be too smart on the number of  
threads). Clearly, on many short operations it may cause a hit, but  
the gain on long vectors is up to 16 which is impressive ...

Cheers,
Simon


> i have no idea how the open-sourced gcd works on
> non-mac hardware
>
> code is downloadable using webdav from
> public.me.com/jdeleeuw/software/threads
> or using afp://gifi.stat.ucla.edu from
> the deleeuw public directory
>
>
> On Sep 17, 2009, at 12:35 , Simon Urbanek wrote:
>
>> On Sep 17, 2009, at 15:20 , Simon Urbanek wrote:
>>
>>> Jan,
>>>
>>> thanks for sharing this. This is really interesting. We have been  
>>> contemplating using GCD for R (mainly pnmath) but at the time OMP  
>>> was faster. However, GCD got apparently really good in the meantime:
>>>
>>> > system.time(threads(100000,1000,"omp_try"))
>>> user  system elapsed
>>> 9.671   0.009   2.441
>>> > system.time(threads(100000,1000,"gcd_try"))
>>> user  system elapsed
>>> 9.592   0.004   2.410
>>> > system.time(threads(100000,1000,"dcg_try"))
>>> user  system elapsed
>>> 9.784   0.003   9.788
>>>
>>> [This is on Harpertown 2.66GHz quad core]
>>>
>>> So GCD is surprisingly just a hair faster than OMP (also  
>>> surprising to me is that using more threads than cores make OMP  
>>> faster - the above is with 16 threads).
>>>
>>
>> Actually, with schedule(dynamic) the gap is almost at the level of  
>> the measurement error:
>>
>> > system.time(threads(100000,1000,"omp_try"))
>>  user  system elapsed
>> 9.614   0.006   2.420
>> > system.time(threads(100000,1000,"gcd_try"))
>>  user  system elapsed
>> 9.586   0.005   2.409
>>
>> -- the OMP line (to be placed before the for() loop) is#pragma omp  
>> parallel for default(shared) private(i) schedule(dynamic)
>>
>> Cheers,
>> Simon
>>
>>
>>>
>>> On Sep 17, 2009, at 14:24 , Jan de Leeuw wrote:
>>>
>>>> a) Obviously OpenMP is more portable. Even on a Mac I had to use  
>>>> Apple's gcc in this case
>>>> (I normally use the GNU gcc-trunk).
>>>>
>>>> b) GCD does not require specifying the number of threads -- it  
>>>> determines it at runtime.
>>>>
>>>> c) Coding is simpler.
>>>>
>>>
>>> I would not say - OMP takes just one #pragma - no need to change  
>>> your code whereas GCD requires several special function calls...  
>>> However, OMP is more limited in the kind of things you can do.
>>>
>>> Cheers,
>>> Simon
>>>
>>>
>>>> d) Since GCD is at a lower OS level than OpenMP, it will probably  
>>>> handle resource allocation
>>>> better. But my small example, on an otherwise idle Mac Pro (16  
>>>> cores, 32 GB of RAM), does
>>>> not really highlight that.
>>>>
>>>> e) For more info, and some OpenMP comparisons, see
>>>>
>>>> http://www.macresearch.org/cocoa-scientists-xxxi-all-aboard-grand-central
>>>> http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/12
>>>>
>>>> To quote Syracuse
>>>>
>>>> "Write your application as usual, but if there's any part of its  
>>>> operation that can
>>>> reasonably be expected to take more than a few seconds to  
>>>> complete, then for the love of Zarzycki,
>>>> get it off the main thread!"
>>>>
>>>> On Sep 17, 2009, at 11:03 , Saptarshi Guha wrote:
>>>>
>>>>> Nice, how does this compare when using OpenMP?
>>>>> How does it compare when several other core hungry processes are  
>>>>> running?( GC is supposed to nicely handle resource allocation,  
>>>>> does OpenMP compete with the other processes?).
>>>>>
>>>>> Regards
>>>>> Saptarshi
>>>>>
>>>>>
>>>>
>>>> ===
>>>> Jan de Leeuw; Distinguished Professor and Chair, UCLA Department  
>>>> of Statistics;
>>>> Director: UCLA Center for Environmental Statistics (CES);
>>>> Editor: Journal of Multivariate Analysis, Journal of Statistical  
>>>> Software;
>>>> US mail: 8125 Math Sciences Bldg, Box 951554, Los Angeles, CA  
>>>> 90095-1554
>>>> phone (310)-825-9550;  fax (310)-206-5658;  email: deleeuw at stat.ucla.edu
>>>> .mac: jdeleeuw ++++++  aim: deleeuwjan ++++++ skype: j_deleeuw
>>>> homepages: http://gifi.stat.ucla.edu ++++++ http://www.cuddyvalley.org
>>>> -------------------------------------------------------------------------------------------------
>>>>       No matter where you go, there you are. --- Buckaroo Banzai
>>>>                http://gifi.stat.ucla.edu/sounds/nomatter.au
>>>>
>>>> _______________________________________________
>>>> R-SIG-Mac mailing list
>>>> R-SIG-Mac at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>>
>>>>
>>>
>>> _______________________________________________
>>> R-SIG-Mac mailing list
>>> R-SIG-Mac at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>
>>>
>>
>>
>
> ===
> Jan de Leeuw; Distinguished Professor and Chair, UCLA Department of  
> Statistics;
> Director: UCLA Center for Environmental Statistics (CES);
> Editor: Journal of Multivariate Analysis, Journal of Statistical  
> Software;
> US mail: 8125 Math Sciences Bldg, Box 951554, Los Angeles, CA  
> 90095-1554
> phone (310)-825-9550;  fax (310)-206-5658;  email: deleeuw at stat.ucla.edu
> .mac: jdeleeuw ++++++  aim: deleeuwjan ++++++ skype: j_deleeuw
> homepages: http://gifi.stat.ucla.edu ++++++ http://www.cuddyvalley.org
> -------------------------------------------------------------------------------------------------
>          No matter where you go, there you are. --- Buckaroo Banzai
>                   http://gifi.stat.ucla.edu/sounds/nomatter.au
> -------------------------------------------------------------------------------------------------
>
>
>
>
>
>



More information about the R-SIG-Mac mailing list