[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)

Henrik Bengtsson hb at biostat.ucsf.edu
Wed Dec 5 04:22:30 CET 2012


On Tue, Dec 4, 2012 at 5:25 PM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
> A somewhat simplistic answer is that we already have that with the "mc.cores" option. In multicore the default was to use all cores (without the need to use detectCores) and yet you could reduce the number as you want with mc.cores. This is similar to what you are talking about but it's not a sufficient solution.
>
> There are some plans for somewhat more general approach. You may have noticed that mcaffinity() was added to query/control/limit the mapping of cores to tasks. It allows much more file-grained control and better decisions whether to recursively split jobs or not as the state is global for the entire R. The (vague) plan is to generalize this for all platforms - if not binding to a particular core then at least to monitor the assigned number of cores.

I did not now about the concept of 'CPU affinity masks', but I can
quickly guess what the idea is, and it certainly provides a richer
control of CPU/core resources.  Yes, it would be very helpful if it
would work cross platform.

Thanks for the heads up.

/Henrik

>
> Cheers,
> Simon
>
>
> On Dec 4, 2012, at 3:24 PM, Henrik Bengtsson wrote:
>
>> In the 'parallel' package there is detectCores(), which tries its best
>> to infer the number of cores on the current machine.  This is useful
>> if you wish to utilize the *maximum* number of cores on the machine.
>> Several are using this to set the number of cores when parallelizing,
>> sometimes also hardcoded within 3rd-party scripts/package code, but
>> there are several settings where you wish to use fewer, e.g. in a
>> compute cluster where you R session is given only a portion of the
>> cores available.  Because of this, I'd like to propose to add
>> getCores(), which by default returns what detectCores() gives, but can
>> also be set to return what is assigned via setCores().  The idea is
>> this getCores() could replace most common usage of detectCores() and
>> provide more control.  An additional feature would be that 'parallel'
>> when loaded would check for command line argument --max-cores=<int>,
>> which will update the number of cores via setCores().  This would make
>> it possible for, say, a Torque/PBS compute cluster to launch an R
>> batch script as
>>
>>  Rscript --max-cores=$PBS_NP script.R
>>
>> and the only thing the script.R needs to know about is parallel::getCores().
>>
>> I understand that I can do all this already in my own scripts, but I'd
>> like to propose a standard for R.
>>
>> Comments?
>>
>> /Henrik
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>



More information about the R-devel mailing list