[Rd] improving the performance of install.packages

Fri Nov 8 21:11:39 CET 2019

Since we are on this topic, another area of improvement is when 
install.packages() downloads hundreds of packages only to realize later 
that many of them actually fail to install because one of the packages 
they depend on (directly or indirectly) failed to install.


On 11/8/19 11:55, Joshua Bradley wrote:
> I could do this...and I have before. This brings up a more fundamental
> question though. You're asking me to write code that changes the logic of
> the installation process (i.e. writing my own package installer). Instead
> of doing that, I would rather integrate that logic into R itself to improve
> the baseline installation process. This api proposal change would be
> additive and would not break legacy code.
> Package managers like pip (python), conda (python), yum (CentOS), apt
> (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
> defaults) when to not download a package again. By proposing this change,
> I'm essentially asking that R follow some of the same conventions and best
> practices that other package managers have adopted over the decades.
> I assumed this list is used to discuss proposals like this to the R
> codebase. If I'm on the wrong list, please let me know.
> P.S. if this change happened, it would be interesting to study the effect
> it has on the bandwidth across all CRAN mirrors. A significant drop would
> turn into actual $$ saved
> Josh Bradley
> On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <murdoch.duncan using gmail.com>
> wrote:
>> On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
>>> Hello,
>>> Currently if you install a package twice:
>>> install.packages("testit")
>>> install.packages("testit")
>>> R will build the package from source (depending on what OS you're using)
>>> twice by default. This becomes especially burdensome when people are
>> using
>>> big packages (i.e. lots of depends) and someone has a script with:
>>> install.packages("tidyverse")
>>> ...
>>> ... later on down the script
>>> ...
>>> install.packages("dplyr")
>>> In this case, "dplyr" is part of the tidyverse and will install twice. As
>>> the primary "package manager" for R, it should not install a package
>> twice
>>> (by default) when it can be so easily checked. Indeed, many people resort
>>> to writing a few lines of code to filter out already-installed packages
>> An
>>> r-help post from 2010 proposed a solution to improving the default
>>> behavior, by adding "force=FALSE" as a api addition to install.packages.(
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Dhelp_2010-2DMay_239492.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=UA8pThQCyQOMZf_tiAAnzSPckXg-h9-262Eu2WCyGHs&s=qtl85Oi2X2-U4rTQW-78pu9_Jb2vhBo1VZZN9pm6M8U&e= )
>>> Would the R-core devs still consider this proposal?
>> Whether or not they'd do it, it's easy for you to do it.
>> install.packages <- function(pkgs, ..., force = FALSE) {
>>     if (!force) {
>>       pkgs <- Filter(Negate(requireNamespace), pkgs
>>     utils::install.packages(pkgs, ...)
>> }
>> You might want to make this more elaborate, e.g. doing update.packages()
>> on the ones that exist.  But really, isn't the problem with the script
>> you're using, which could have done a simple test before forcing a slow
>> install?
>> Duncan Murdoch
