[Rd] Parallel makes
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Mar 9 20:36:33 CET 2009
Now multi-core machines are more widely available, we have gotten to
stress-test the parallel building capabilities of R and of packages.
The current Windows and Mac build machines are both 8-core and I test
on an 8-core machine. These are all fairly recent changes of hardware
and the following applies only to R-devel, the version to become 2.9.0
next month.
Parallel builds of R under Unix-alikes have long been supported, and
now allow rather more to be done in parallel. Using 'make -j' will
work on a machine with enough resources, and gives something like a 3x
speed up. THe main limiting factor is converting help, which is done
serially and is likely remain so until we move to R-based conversion.
New for this version is the ability to install in parallel (e.g. 'make
-j install install-pdf'). It is also possible to check the R build in
parallel, but the output is so intermingled that it is hard to see any
discrepancies. However, few people build R from scratch every day and
for those with such powerful machines building R is probably already
fast enough (ca 3 mins).
When installing or checking a single package, the only thing done in
parallel is making any compiled code (2.8.x ran tests in the 'tests'
directory in parallel, but this is not currently done). The standard
procedures work safely with parallel make, but users who write
Makevars or Makefile files need to take this into account. For
example, yesterday's Rcsdp/src/Makevars has
PHONY: all
all: before $(SHLIB)
before: Csdp.ts
but this 'before' target has to be completed before $(SHLIB) can be
built (and it failed to install for me). I am aware that the
documentation did not stress this sufficiently in the past, and
'Writing R Extensions' has been revised to do so. (And the package
has been updated with alacrity, thank you.)
Because Windows users are much less likely to be aware of these issues
and because of the last para below, Uwe and I tweaked the procedures
for the Windows build machine so that packages are always
installed/checked with a non-paralle make.
Installing/updating packages in parallel can help a lot, and we've
made two changes to facilitate that. First, there is a new option for
R CMD INSTALL, --pkglock. This uses locks on a per-package basis so
prevents more than one process trying to install a package at the same
time, but allows several packages to be installed to the same library
simultaneously. This places the onus on the caller to ensure that
dependencies are installed first, and the 'Ncpus' option to
install.packages() provides a way to marshall package installation to
make best use of multiple CPUs.
Under Windows 'make all' and 'make recommended; (but not 'make
distribution') can each be done in parallel. There are some question
marks over how well the 'make; used on Windows works in parallel (we
found one case where it worked incorrectly and had to rethink
share/make/winshlib.mk) so it should be used with caution.
Please give these new facilties a go and report (here) how you get
on.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list