[R-SIG-Mac] Optimization flags for G5 and G4

Simon Urbanek simon.urbanek at r-project.org
Sun Feb 20 21:57:39 CET 2005


Due to popular demand I decided to post a short info on optimization 
flags for Macs.

In general if you want your code to be optimized for (and work only 
on!) a specific CPU, say G5, all you have to do is to specify
-mtune=G5 -mcpu=G5
By the way, using "G4" is also legal, of course.

However, there is more than just CPU optimization - alignment, 
inlining, loops etc. all this can be handled several ways. Therefore 
there are many additional flags you can use for optimization. The most 
aggressive flag is "-fast" (see man gcc), but it's not recommended for 
R, because it includes -ffast-math which produces fp code which is not 
IEEE-conform, and simply produces wrong result is some cases (important 
in handling of NAs which just doesn't work in a reliable way). But you 
can try most of the other flags.

Note also, that g77 doesn't necessarily support the same arguments as 
gcc. You may need to cut-down your list for fortran then.

Using the flags - don't forget to specify both CFLAGS and FFLAGS. In 
most cases you probably want to the CXXFLAGS, too, because some 
routines (such as svm or gbp for example) use C++ code.

Finally here go some examples of what you may want to try for a G5 
machine:
CFLAGS='-O3 -fgcse-sm -funroll-loops -fstrict-aliasing 
-fsched-interblock -falign-loops=16 -falign-jumps=16 
-falign-functions=16 -falign-jumps-max-skip=15 
-falign-loops-max-skip=15 -malign-natural -freorder-blocks 
-freorder-blocks-and-partition -mpowerpc-gfxopt -mpowerpc-gpopt 
-fstrict-aliasing -ftree-vectorize -mtune=G5 -mcpu=G5'
FFLAGS='-O3 -fgcse-sm -funroll-loops -fstrict-aliasing 
-fsched-interblock -falign-loops=16 -falign-jumps=16 
-falign-functions=16 -malign-natural -freorder-blocks 
-freorder-blocks-and-partition -mpowerpc-gfxopt -mpowerpc-gpopt 
-fstrict-aliasing -ftree-vectorize -mtune=G5 -mcpu=G5'

Note that -ftree-vectorize is supported only in gcc 4.0, so remove it 
for older compilers. Normally you should set CXXFLAGS to the same as 
CFLAGS. Always run make check to see whether the compiled code still 
behaves correctly. You mileage may vary.

A word on optimization: the benefit of compiler optimization will vary 
a lot with the application. Optimization affects only C/Fortran code in 
R and packages, so it is not likely to speed up your application if you 
use BLAS heavily, because that is already optimized. Finally, and most 
importantly, compiler optimization won't improve the speed if you use 
bad R code - your first step should always be to optimize your R code 
(use profiler to figure out where most time is spent - there were cases 
where 80% of the time was spent on unnecessary coercions). If speed is 
really that important to you, try R byte-compiler, it may or may not 
help in your case.

Cheers,
Simon



More information about the R-SIG-Mac mailing list