[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)

Henrik Bengtsson hb at biostat.ucsf.edu
Tue Dec 4 21:24:15 CET 2012


In the 'parallel' package there is detectCores(), which tries its best
to infer the number of cores on the current machine.  This is useful
if you wish to utilize the *maximum* number of cores on the machine.
Several are using this to set the number of cores when parallelizing,
sometimes also hardcoded within 3rd-party scripts/package code, but
there are several settings where you wish to use fewer, e.g. in a
compute cluster where you R session is given only a portion of the
cores available.  Because of this, I'd like to propose to add
getCores(), which by default returns what detectCores() gives, but can
also be set to return what is assigned via setCores().  The idea is
this getCores() could replace most common usage of detectCores() and
provide more control.  An additional feature would be that 'parallel'
when loaded would check for command line argument --max-cores=<int>,
which will update the number of cores via setCores().  This would make
it possible for, say, a Torque/PBS compute cluster to launch an R
batch script as

  Rscript --max-cores=$PBS_NP script.R

and the only thing the script.R needs to know about is parallel::getCores().

I understand that I can do all this already in my own scripts, but I'd
like to propose a standard for R.

Comments?

/Henrik



More information about the R-devel mailing list