[R-SIG-Mac] How to Speed up R on the G5
Bill Northcott
w.northcott at unsw.edu.au
Tue Feb 8 01:21:07 CET 2005
On 08/02/2005, at 3:19 AM, Jake Bowers wrote:
> I've been receiving some friendly grief from a friend with a Linux
> dual-Opteron system about the performance of his R package on the OS X
> G5
> system.
>
> He has suggested recompiling R-patched with a variety of different
> compilers and compiler flags. And has also suggested just recompiling
> his package with different flags and compilers (while leaving
> r-patched as I have currently built it using gcc 3.3 20030304 (Apple
> Computer, Inc. build 1671), and g77 3.4.2 (from that wonderful site:
> hpc.sf.net)).
Apple have put up a good article on Performance optimisation:
http://developer.apple.com/tools/sharkoptimize.html
The moral is: 'measure first. Futz afterwards.' You really must start
by finding out where the program is spending the time. There is
absolutely no point optimising code that is rarely called. There can
be huge gains from optimising very small amounts of heavily used code.
On the G4/G5 architectures, the big gains come if you can vectorise any
of that heavily used code. If you are not using Altivec, half the CPU
is hanging around doing nothing.
>
> My second question is whether there are ways other than using
> --with-blas="-framework vecLib", to take advantage of what I thought
> was the power of the G5 (or dual G5s in my case).
Run top and see if you are using both cpus. If not then Rmpi or
something like that may pay big dividends.
>
> Here is what I'm playing with:
>
> 1) One set of builds with standard compilers and flags
> (--with-blas="-framework vecLib" --with-lapack")
I would take the advice from hpc.sf.net and just use the -fast flag,
but only on code that you know from profiling to be time critical.
There is a downside as others have observed below.
>
> 2) One build like (1) but using the libgoto.dylib version of BLAS and
> the vecLib stuff for lapack (It doesn't work with just
> --with-blas"-L/usr/local/lib -lgoto"
> --with-lapack).
> (http://www.cs.utexas.edu/users/kgoto/signup_first.html#For_OS_X)
IMHO from what I see on Goto's site, I doubt that libgoto will do
anything for the G5 architecture. The Power 3 data he shows indicates
little or no benefit. His optimisations seem to work well for x86 and
Alpha. However, none of this matters, if you are not spending much
time in the BLAS library.
> but, although this compiled ok, it failed the make check on the first
> test (base-Ex.R with:
>> tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)
>> x <- rep(0:8, tx0)
>> stopifnot(table(x) == tx0)
> Warning in table(x) == tx0 : longer object length
> is not a multiple of shorter object length
> Error in stopifnot(table(x) == tx0) : dim<- : dims [product 8] do not
> match the length of object [9]
> Execution halted)
>
Only optimise where it matters. It may cause problems.
> Finally, he suggested looking into the AbSoft compilers. But, I
> figured I'd save my money and see if other folks have had luck with
> those yet.
As far as I can see the IBM (not Absoft) xlf and xlc compilers are
significantly faster, although Apple is working hard on gcc to close
the gap.
Other thoughts:
1. I don't think there is any point wasting time on Fortran. The base
R distribution as built on a Mac uses no Fortran code. As far as I can
see very few R packages use Fortran.
2. Some one else mentioned MCMCs. These are embarassingly parallel
applications and if they are not using both CPUs they are going to be
inefficient.
Finally some (so far very preliminary) experience:
I have spent a little time on JAGS, a WinBUGS (MCMC) work alike which
uses the standalone libRmath. Running the WinBUGS kidney example, this
code spends almost all its time in the libm functions power, exp and
log which are called from the Weibull distribution functions in R.
AFAIK these are not vectorised. At the moment I not comparing Mac vs PC
but WinBUGS vs JAGS. The author of JAGS thinks the sampling code is
inefficient, hence the libm functions are called too often. I am
interested in trying to replace the calls through libRmath into libm
with vectorised code, which I suspect will be much more effective on
the Mac.
Of course aggressively optimising the compilation of the JAGS code
makes absolutely no discernable difference to overall performance in my
example.
Bill Northcott
More information about the R-SIG-Mac
mailing list