[R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
Tim Taylor
t|m@t@y|or @end|ng |rom h|ddene|eph@nt@@co@uk
Sat Aug 26 10:15:40 CEST 2023
I’m definitely sympathetic to both sides but have come around to the view of Greg, Dirk et al. It seems sensible to have a default that benefits the majority of “normal” users and require explicit action in shared environments not vice-versa.
That is not to say that data.table could not do better with it’s heuristics (e.g. respecting CGroups settings as raised by Henrik in https://github.com/Rdatatable/data.table/issues/5620) but the current defaults (50%) seem reasonable for, dare I say, most users.
Tim
> On 26 Aug 2023, at 03:20, Greg Hunt <greg using firmansyah.com> wrote:
>
> The question should be, in how many cases is the current behaviour a
> problem? In a shared environment, sure, you have to be more careful. I'd
> say don't let the teenagers in there. The CRAN build server does need to do
> something to protect itself and I don't greatly mind the 2 thread limit, I
> implemented it by hand in my examples and didn't think about it
> afterwards. On most 8, 16 or 32 way environments, dedicated or
> semi-dedicated to a particular workload, the defaults make some level of
> sense and they are probably most of the use cases. Protecting high
> processor count environments from people who don't know what they are doing
> would seem to be a mismatch between the people and the environment, not so
> much a matter of software.
>
>> On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
>> wrote:
>>
>> You have a really bizarre way of twisting what others are saying, Dirk. I
>> have seen no-one here saying 'limit R to 2 threads' except for you, as a
>> way to paint opposing views to be absurd.
>>
>> What _is_ being said is that users need to be in control_, but _the
>> default needs to do least harm_ until those users take responsibility for
>> that control. Do not turn the throttle up until the user is prepared for
>> the consequences. Trying to subvert that responsibility into packages by
>> default is going to make more trouble than giving the people using those
>> packages simple examples of how to take that control.
>>
>> A similar problem happens when users discover .Rprofile and insert all
>> those pesky library statements into it, making their scripts
>> irreproducible. If data.table made a warp10() function that activated this
>> current default performance setting then the user would be clearly at fault
>> for using it in an inappropriate environment like a shared HPC or the CRAN
>> servers. Don't put a brick on the accelerator of a teenager's car before
>> they even figure out where the brakes are.
>>
>>> On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel <edd using debian.org>
>>> wrote:
>>>
>>>> On 26 August 2023 at 12:05, Simon Urbanek wrote:
>>> | In reality it's more people running R on their laptops vs the rest of
>> the world.
>>>
>>> My point was that we also have 'single user on really Yuge workstation'.
>>>
>>> Plus we all know that those users are often not sysadmins, and do not have
>>> our levels of accumulated systems knowledge.
>>>
>>> So we should give _more_ power by default, not less.
>>>
>>> | [...] they will always be saying blatantly false things like "R is not
>> for large data"
>>>
>>> By limiting R (and/or packages) to two threads we will only get more of
>>> these. Our collective call.
>>>
>>> This whole thread is pretty sad, actually.
>>>
>>> Dirk
>>>
>>
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
[[alternative HTML version deleted]]
More information about the R-package-devel
mailing list