[Rd] Parallel makes

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Mar 9 20:36:33 CET 2009


Now multi-core machines are more widely available, we have gotten to 
stress-test the parallel building capabilities of R and of packages. 
The current Windows and Mac build machines are both 8-core and I test 
on an 8-core machine.  These are all fairly recent changes of hardware 
and the following applies only to R-devel, the version to become 2.9.0 
next month.

Parallel builds of R under Unix-alikes have long been supported, and 
now allow rather more to be done in parallel.  Using 'make -j' will 
work on a machine with enough resources, and gives something like a 3x 
speed up.  THe main limiting factor is converting help, which is done 
serially and is likely remain so until we move to R-based conversion. 
New for this version is the ability to install in parallel (e.g. 'make 
-j install install-pdf').  It is also possible to check the R build in 
parallel, but the output is so intermingled that it is hard to see any 
discrepancies.  However, few people build R from scratch every day and 
for those with such powerful machines building R is probably already 
fast enough (ca 3 mins).

When installing or checking a single package, the only thing done in 
parallel is making any compiled code (2.8.x ran tests in the 'tests' 
directory in parallel, but this is not currently done).  The standard 
procedures work safely with parallel make, but users who write 
Makevars or Makefile files need to take this into account.  For 
example, yesterday's Rcsdp/src/Makevars has

PHONY: all

all: before $(SHLIB)

before: Csdp.ts

but this 'before' target has to be completed before $(SHLIB) can be 
built (and it failed to install for me).  I am aware that the 
documentation did not stress this sufficiently in the past, and 
'Writing R Extensions' has been revised to do so.  (And the package 
has been updated with alacrity, thank you.)

Because Windows users are much less likely to be aware of these issues 
and because of the last para below, Uwe and I tweaked the procedures 
for the Windows build machine so that packages are always 
installed/checked with a non-paralle make.

Installing/updating packages in parallel can help a lot, and we've 
made two changes to facilitate that.  First, there is a new option for 
R CMD INSTALL, --pkglock.  This uses locks on a per-package basis so 
prevents more than one process trying to install a package at the same 
time, but allows several packages to be installed to the same library 
simultaneously.  This places the onus on the caller to ensure that 
dependencies are installed first, and the 'Ncpus' option to 
install.packages() provides a way to marshall package installation to 
make best use of multiple CPUs.

Under Windows 'make all' and 'make recommended; (but not 'make 
distribution') can each be done in parallel.  There are some question 
marks over how well the 'make; used on Windows works in parallel (we 
found one case where it worked incorrectly and had to rethink 
share/make/winshlib.mk) so it should be used with caution.

Please give these new facilties a go and report (here) how you get 
on.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list