[R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

Tim Taylor t|m@t@y|or @end|ng |rom h|ddene|eph@nt@@co@uk
Sat Aug 26 10:15:40 CEST 2023


I’m definitely sympathetic to both sides but have come around to the view of Greg, Dirk et al. It seems sensible to have a default that benefits the majority of “normal” users and require explicit action in shared environments not vice-versa.

That is not to say that data.table could not do better with it’s heuristics (e.g. respecting CGroups settings as raised by Henrik in https://github.com/Rdatatable/data.table/issues/5620) but the current defaults (50%) seem reasonable for, dare I say, most users.

Tim

> On 26 Aug 2023, at 03:20, Greg Hunt <greg using firmansyah.com> wrote:
> 
> The question should be, in how many cases is the current behaviour a
> problem?  In a shared environment, sure, you have to be more careful.  I'd
> say don't let the teenagers in there. The CRAN build server does need to do
> something to protect itself and I don't greatly mind the 2 thread limit, I
> implemented it by hand in my examples and didn't think about it
> afterwards.  On most 8, 16 or 32 way environments, dedicated or
> semi-dedicated to a particular workload, the defaults make some level of
> sense and they are probably most of the use cases.  Protecting high
> processor count environments from people who don't know what they are doing
> would seem to be a mismatch between the people and the environment, not so
> much a matter of software.
> 
>> On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
>> wrote:
>> 
>> You have a really bizarre way of twisting what others are saying, Dirk. I
>> have seen no-one here saying 'limit R to 2 threads' except for you, as a
>> way to paint opposing views to be absurd.
>> 
>> What _is_ being said is that users need to be in control_, but _the
>> default needs to do least harm_ until those users take responsibility for
>> that control. Do not turn the throttle up until the user is prepared for
>> the consequences. Trying to subvert that responsibility into packages by
>> default is going to make more trouble than giving the people using those
>> packages simple examples of how to take that control.
>> 
>> A similar problem happens when users discover .Rprofile and insert all
>> those pesky library statements into it, making their scripts
>> irreproducible. If data.table made a warp10() function that activated this
>> current default performance setting then the user would be clearly at fault
>> for using it in an inappropriate environment like a shared HPC or the CRAN
>> servers. Don't put a brick on the accelerator of a teenager's car before
>> they even figure out where the brakes are.
>> 
>>> On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel <edd using debian.org>
>>> wrote:
>>> 
>>>> On 26 August 2023 at 12:05, Simon Urbanek wrote:
>>> | In reality it's more people running R on their laptops vs the rest of
>> the world.
>>> 
>>> My point was that we also have 'single user on really Yuge workstation'.
>>> 
>>> Plus we all know that those users are often not sysadmins, and do not have
>>> our levels of accumulated systems knowledge.
>>> 
>>> So we should give _more_ power by default, not less.
>>> 
>>> | [...] they will always be saying blatantly false things like "R is not
>> for large data"
>>> 
>>> By limiting R (and/or packages) to two threads we will only get more of
>>> these.  Our collective call.
>>> 
>>> This whole thread is pretty sad, actually.
>>> 
>>> Dirk
>>> 
>> 
>> --
>> Sent from my phone. Please excuse my brevity.
>> 
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> 
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list