[R-SIG-Mac] How to Speed up R on the G5

Tue Feb 8 03:25:34 CET 2005

On 08/02/2005, at 11:44 AM, Byron Ellis wrote:
>>> My second question is whether there are ways other than using
>>> --with-blas="-framework vecLib", to take advantage of what I thought
>>> was the power of the G5 (or dual G5s in my case).
>>
>> Run top and see if you are using both cpus.  If not then Rmpi or
>> something like that may pay big dividends.
>
> FWIW, all versions of R are single processor (it doesn't even have a
> GIL) so that won't make much difference when comparing the two systems.

As I understand the Rmpi package it works inconjunction with SPRNG (a 
synchronised parallel RNG) and MPI libraries to allow the writing of R 
code which can run in multiple processes on more than one CPU/machine.
Normal R code only runs on one CPU but using Rmpi it should be almost 
trivially easy (once you get Rmpi to work) to get multiple MCMC chains 
to run on multiple CPUs.  Whether you can do the same with other code, 
depends on the code.  'GIL' means nothing to me I'm afraid.

> It may also be that there is a large branching penalty for spending
> time in interpreted code (i.e. within R itself) on the G5 when compared
> to the Opteron---you can see that from the SPECint benchmarks. AFAIK
> the Opteron has a very small number of inflight instructions (fewer
> even than the Pentium 3/4. Speaking of which, I was at a thing with a
> couple of guys from SLAC and they were mentioning that the best way to
> boost P4 performance is to turn off SMT), something like 90 compared to
> the G5's 200 or so.

Of course the way to find out is to profile the code using Shark.  I 
would take a lot of convincing that SPECint measures had much relevance 
to running simulations in R.
>>
>> Finally some (so far very preliminary) experience:
>> I have spent a little time on JAGS, a WinBUGS (MCMC) work alike which
>> uses the standalone libRmath.  Running the WinBUGS kidney example,
>> this code spends almost all its time in the libm functions power, exp
>> and log which are called from the Weibull distribution functions in R.
>>  AFAIK these are not vectorised. At the moment I not comparing Mac vs
>> PC but WinBUGS vs JAGS.  The author of JAGS thinks the sampling code
>> is inefficient, hence the libm functions are called too often.  I am
>> interested in trying to replace the calls through libRmath into libm
>> with vectorised code, which I suspect will be much more effective on
>> the Mac.
>
> Interesting. I don't know how much vector version would help with
> something like a Gibbs sampler where the next draw depends on values
> that each on each iteration through the sampler...

That is why you need Shark.  Using it shows that the MCMC spend 
extremely small amounts of time in the MCMC sampler, and lots of time 
in libm.  There are only two possible ways to speed it up:
1.  Use a better sampler which makes fewer calls to the R Weibull stuff 
and hence to libm.
and/or
2.  Replace the calls to libm with something faster (and possibly lower 
precision) such as Altivec calls.

Either or both together should work.

Bill Northcott