[R-SIG-Mac] [R-sig-hpc] Grand Central Dispatch (simple loop optimization)
Simon Urbanek
simon.urbanek at r-project.org
Thu Sep 17 22:49:02 CEST 2009
Jan,
On Sep 17, 2009, at 16:16 , Jan de Leeuw wrote:
> on my system (2 x 2.93 quad core Nehalem
> with hyper-threading, so 16 threads max, 16GB RAM,
> 10.6.1, 64bit kernel, 64bit R)
>
> > system.time(threads(100000,1000,"omp"))
> user system elapsed
> 10.249 0.009 0.662
> > system.time(threads(100000,1000,"gcd"))
> user system elapsed
> 10.208 0.008 0.668
> > system.time(threads(100000,1000,"dcg"))
> user system elapsed
> 8.731 0.005 8.738
>
> so omp == gcd, but for more complicated tasks the
> tighter integration may favor gcd
>
> comparing harpertown and nehalem --> surprising
> difference (kernel ? hyper-threading ?)
>
Interesting but consistent with my observations so far - Nehalems are
not any faster than equally clocked Harpertowns (see dcg time). The
only gains are in HT as seen in your example - my Harpertown has 4
logical cpus, yours has 16. My 2.26GHz Nehalem is running Leopard
(because it's the build machine ;)) but the results are similar:
> system.time(threads(100000,1000,"omp_try"))
user system elapsed
12.924 0.031 0.852
> system.time(threads(100000,1000,"dcg_try"))
user system elapsed
11.595 0.009 11.608
Again, the sequential time is about the same as on equally clocked
Harpertown, but the HT helps with a factor of over 13. That explains
where the alleged performance boost on Nehalems comes from ...
It would be interesting to run OMP pnmath with schedule(dynamic) on a
8-core Nehalem and compare that with a stock R ... (pnmath will need a
bit of tweaking because it attempts to be too smart on the number of
threads). Clearly, on many short operations it may cause a hit, but
the gain on long vectors is up to 16 which is impressive ...
Cheers,
Simon
> i have no idea how the open-sourced gcd works on
> non-mac hardware
>
> code is downloadable using webdav from
> public.me.com/jdeleeuw/software/threads
> or using afp://gifi.stat.ucla.edu from
> the deleeuw public directory
>
>
> On Sep 17, 2009, at 12:35 , Simon Urbanek wrote:
>
>> On Sep 17, 2009, at 15:20 , Simon Urbanek wrote:
>>
>>> Jan,
>>>
>>> thanks for sharing this. This is really interesting. We have been
>>> contemplating using GCD for R (mainly pnmath) but at the time OMP
>>> was faster. However, GCD got apparently really good in the meantime:
>>>
>>> > system.time(threads(100000,1000,"omp_try"))
>>> user system elapsed
>>> 9.671 0.009 2.441
>>> > system.time(threads(100000,1000,"gcd_try"))
>>> user system elapsed
>>> 9.592 0.004 2.410
>>> > system.time(threads(100000,1000,"dcg_try"))
>>> user system elapsed
>>> 9.784 0.003 9.788
>>>
>>> [This is on Harpertown 2.66GHz quad core]
>>>
>>> So GCD is surprisingly just a hair faster than OMP (also
>>> surprising to me is that using more threads than cores make OMP
>>> faster - the above is with 16 threads).
>>>
>>
>> Actually, with schedule(dynamic) the gap is almost at the level of
>> the measurement error:
>>
>> > system.time(threads(100000,1000,"omp_try"))
>> user system elapsed
>> 9.614 0.006 2.420
>> > system.time(threads(100000,1000,"gcd_try"))
>> user system elapsed
>> 9.586 0.005 2.409
>>
>> -- the OMP line (to be placed before the for() loop) is#pragma omp
>> parallel for default(shared) private(i) schedule(dynamic)
>>
>> Cheers,
>> Simon
>>
>>
>>>
>>> On Sep 17, 2009, at 14:24 , Jan de Leeuw wrote:
>>>
>>>> a) Obviously OpenMP is more portable. Even on a Mac I had to use
>>>> Apple's gcc in this case
>>>> (I normally use the GNU gcc-trunk).
>>>>
>>>> b) GCD does not require specifying the number of threads -- it
>>>> determines it at runtime.
>>>>
>>>> c) Coding is simpler.
>>>>
>>>
>>> I would not say - OMP takes just one #pragma - no need to change
>>> your code whereas GCD requires several special function calls...
>>> However, OMP is more limited in the kind of things you can do.
>>>
>>> Cheers,
>>> Simon
>>>
>>>
>>>> d) Since GCD is at a lower OS level than OpenMP, it will probably
>>>> handle resource allocation
>>>> better. But my small example, on an otherwise idle Mac Pro (16
>>>> cores, 32 GB of RAM), does
>>>> not really highlight that.
>>>>
>>>> e) For more info, and some OpenMP comparisons, see
>>>>
>>>> http://www.macresearch.org/cocoa-scientists-xxxi-all-aboard-grand-central
>>>> http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/12
>>>>
>>>> To quote Syracuse
>>>>
>>>> "Write your application as usual, but if there's any part of its
>>>> operation that can
>>>> reasonably be expected to take more than a few seconds to
>>>> complete, then for the love of Zarzycki,
>>>> get it off the main thread!"
>>>>
>>>> On Sep 17, 2009, at 11:03 , Saptarshi Guha wrote:
>>>>
>>>>> Nice, how does this compare when using OpenMP?
>>>>> How does it compare when several other core hungry processes are
>>>>> running?( GC is supposed to nicely handle resource allocation,
>>>>> does OpenMP compete with the other processes?).
>>>>>
>>>>> Regards
>>>>> Saptarshi
>>>>>
>>>>>
>>>>
>>>> ===
>>>> Jan de Leeuw; Distinguished Professor and Chair, UCLA Department
>>>> of Statistics;
>>>> Director: UCLA Center for Environmental Statistics (CES);
>>>> Editor: Journal of Multivariate Analysis, Journal of Statistical
>>>> Software;
>>>> US mail: 8125 Math Sciences Bldg, Box 951554, Los Angeles, CA
>>>> 90095-1554
>>>> phone (310)-825-9550; fax (310)-206-5658; email: deleeuw at stat.ucla.edu
>>>> .mac: jdeleeuw ++++++ aim: deleeuwjan ++++++ skype: j_deleeuw
>>>> homepages: http://gifi.stat.ucla.edu ++++++ http://www.cuddyvalley.org
>>>> -------------------------------------------------------------------------------------------------
>>>> No matter where you go, there you are. --- Buckaroo Banzai
>>>> http://gifi.stat.ucla.edu/sounds/nomatter.au
>>>>
>>>> _______________________________________________
>>>> R-SIG-Mac mailing list
>>>> R-SIG-Mac at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>>
>>>>
>>>
>>> _______________________________________________
>>> R-SIG-Mac mailing list
>>> R-SIG-Mac at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>
>>>
>>
>>
>
> ===
> Jan de Leeuw; Distinguished Professor and Chair, UCLA Department of
> Statistics;
> Director: UCLA Center for Environmental Statistics (CES);
> Editor: Journal of Multivariate Analysis, Journal of Statistical
> Software;
> US mail: 8125 Math Sciences Bldg, Box 951554, Los Angeles, CA
> 90095-1554
> phone (310)-825-9550; fax (310)-206-5658; email: deleeuw at stat.ucla.edu
> .mac: jdeleeuw ++++++ aim: deleeuwjan ++++++ skype: j_deleeuw
> homepages: http://gifi.stat.ucla.edu ++++++ http://www.cuddyvalley.org
> -------------------------------------------------------------------------------------------------
> No matter where you go, there you are. --- Buckaroo Banzai
> http://gifi.stat.ucla.edu/sounds/nomatter.au
> -------------------------------------------------------------------------------------------------
>
>
>
>
>
>
More information about the R-SIG-Mac
mailing list