[R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

Uwe Ligges ||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de
Tue Aug 29 18:27:54 CEST 2023


Dear all,

in today's R Core meeting both the CRAN team and R Core agree with 
Simon's suggestion below.

Let me repeat the key points:

- We will try to add some interface to R that allows for more unified 
control about the various ways of parallelisation. That should allow 
users to opt in for more than 2 cores and/or threads and/or processes. 
Details will follow as this is not simple.

- As long as users do not have simple ways of controlling how demanding 
code is (e.g., different ways of parallelizationare used even in nested 
ways), CRAN will further on protect users and enforce that packages do 
not use more than 2 cores by default.

Best,
Uwe Ligges



On 26.08.2023 02:05, Simon Urbanek wrote:
> 
> 
>> On Aug 26, 2023, at 11:01 AM, Dirk Eddelbuettel <edd using debian.org> wrote:
>>
>>
>> On 25 August 2023 at 18:45, Duncan Murdoch wrote:
>> | The real problem is that there are two stubborn groups opposing each
>> | other:  the data.table developers and the CRAN maintainers.  The former
>> | think users should by default dedicate their whole machine to
>> | data.table.  The latter think users should opt in to do that.
>>
>> No, it feels more like it is CRAN versus the rest of the world.
>>
> 
> 
> In reality it's more people running R on their laptops vs the rest of the world. Although people with laptops are the vast majority, they also are the least impacted by the decision going either way. I think Jeff summed up the core reasoning pretty well. Harm is done by excessive use, not other other way around.
> 
> That said, I think this thread is really missing the key point: there is no central mechanism that would govern the use of CPU resources. OMP_THREAD_LIMIT is just one of may ways and even that is vastly insufficient for reasons discussed (e.g, recursive use of processes). It is not CRAN's responsibility to figure out for each package what it needs to behave sanely - it has no way of knowing what type of parallelism is used, under which circumstances and how to control it. Only the package author knows that (hopefully), which is why it's on them. So instead of complaining here better use of time would be to look at what's being used in packages and come up with a unified approach to monitoring core usage and a mechanism by which the packages could self-govern to respect the desired limits. If there was one canonical place, it would be also easy for users to opt in/out as they desire - and I'd be happy to help if any components of it need to be in core R.
> 
> 
> 
>> Take but one example, and as I may have mentioned elsewhere, my day job consists in providing software so that (to take one recent example) bioinformatics specialist can slice huge amounts of genomics data.  When that happens on a dedicated (expensive) hardware with dozens of cores, it would be wasteful to have an unconditional default of two threads. It would be the end of R among serious people, no more, no less. Can you imagine how the internet headlines would go: "R defaults to two threads".
>>
> 
> If you run on such a machine then you or your admin certainly know how to set the desired limits. From experience the problem is exactly the opposite - it's far more common for users to not know how to not overload such a machine. As for internet headlines, they will always be saying blatantly false things like "R is not for large data" even though we have been using it to analyze terabytes of data per minute ...
> 
> Cheers,
> Simon
> 
> 
> 
>> And it is not just data.table as even in the long thread over in its repo we have people chiming in using OpenMP in their code (as data.table does but which needs a different setter than the data.table thread count).
>>
>> It is the CRAN servers which (rightly !!) want to impose constraints for when packages are tested.  Nobody objects to that.
>>
>> But some of us wonder if settings these defaults for all R user, all the time, unconditional is really the right thing to do.  Anyway, Uwe told me he will take it to an internal discussion, so let's hope sanity prevails.
>>
> 
>



More information about the R-package-devel mailing list