[Rd] improving the performance of install.packages

Joshua Bradley jgbr@d|ey1 @end|ng |rom gm@||@com
Sat Nov 9 03:24:58 CET 2019


Just to clarify the expected behavior I had in mind when proposing the
force argument.

force = T would mean you will "force" an install no matter what (aligns
with the current behavior of the command)

force = F means install a package if it is not found in the local R library
on your system. If it is already installed, do nothing and return as if a
successfull install occurred.



On Fri, Nov 8, 2019, 7:27 PM Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:

> On 08/11/2019 6:17 p.m., Henrik Bengtsson wrote:
> > I believe introducing a backward compatible force=TRUE is a good
> > start, even if we're not ready for making force=FALSE the default at
> > this point.  It would help simplify quite-common instructions like
> >
> > if (requireNamespace("BiocManager"))
> >    install.packages("BiocManager")
> > BiocManager::install(...)
> >
> > to
> >
> > install.packages("BiocManager", force=FALSE)
> > BiocManager::install(...)
>
> If simplifying instructions is the goal, it would be even simpler to
> just install it unconditionally:
>
> install.packages("BiocManager")
>
> Unlike dplyr (the original example in this thread), BiocManager is a
> tiny package with no compiling needed, so it hardly needs any time to
> install.
>
> And as previously mentioned, the backward compatible force=TRUE wouldn't
> help with the bad script at all.  In fact, the bad script could be fixed
> simply by realizing that
>
> install.packages("tidyverse")
>
> means it's actually a bad idea to also include
>
> install.packages("dplyr")
>
> because the former would install dplyr if and only if it was not already
> installed.  So it seems to me that fixing the bad script (by deleting
> one line) is the solution to the problem, not fixing R with a multistage
> series of revisions, tests, etc.
>
> Duncan Murdoch
>
> >
> > and more so when installing lots of packages conditionally, e.g.
> >
> > if (requireNamespace("foo")) install.packages("foo")
> > if (requireNamespace("bar")) install.packages("bar")
> > ...
> >
> > to
> >
> > install.packages(c("foo", "bar", ...), force = FALSE)
> >
> > Before deciding on making force=FALSE the new default, I think it
> > would be valuable to play the devil's advocate and explore and
> > identify all possible downsides of such a default, e.g. breaking
> > existing instructions, downstream package code that uses
> > install.packages() internally, and so on.
> >
> > /Henrik
> >
> > PS. Although the idea of having update.packages() install missing
> > packages is not bad, I don't think I'm a not a fan for the sole
> > purpose of risking installation instructions starting using
> > update.packages() instead, which will certainly confuse those who
> > don't know the history (think require() vs library()).
> >
> > On Fri, Nov 8, 2019 at 3:11 PM Pages, Herve <hpages using fredhutch.org>
> wrote:
> >>
> >> Hi Gabe,
> >>
> >> Keeping track of where a package was installed from would be a nice
> >> feature. However it wouldn't be as reliable as comparing hashes to
> >> decide whether a package needs re-installation or not.
> >>
> >> H.
> >>
> >> On 11/8/19 12:37, Gabriel Becker wrote:
> >>> Hi Josh,
> >>>
> >>> There are a few issues I can think of with this. The primary one is
> that
> >>> CRAN(/Bioconductor) is not the only place one can install packages
> from. I
> >>> might have version x.y.z of a package installed that was, at the time,
> a
> >>> development version I got from github, or installed locally, etc. Hell
> I
> >>> might have a later devel version but want the CRAN version. Not common,
> >>> sure, but wiill likely happen often enough that install.packages not
> doing
> >>> that for me when I tell it to is probably bad.
> >>>
> >>> Currently (though there has been some discussion of changing this)
> packages
> >>> do not remember where they were installed from, so R wouldn't know if
> the
> >>> version you have is actually fully the same one on the repository you
> >>> pointed install.packages to or not.  If that were changed  and we knew
> that
> >>> we were getting the byte identical package from the actual same
> source, I
> >>> think this would be a nice addition, though without it I think it
> would be
> >>> right a high but not high enough proportion of the time.
> >>>
> >>> R will build the package from source (depending on what OS you're
> using)
> >>>> twice by default. This becomes especially burdensome when people are
> using
> >>>> big packages (i.e. lots of depends) and someone has a script with:
> >>>>
> >>>
> >>>
> >>> install.packages("tidyverse")
> >>>> ...
> >>>> ... later on down the script
> >>>> ...
> >>>> install.packages("dplyr")
> >>>>
> >>>
> >>> I mean, IMHO and as I think Duncan was alluding to, that's straight up
> an
> >>> error by the script author. I think its a few of them, actually, but
> its at
> >>> least one. An understandable one, sure, but thats still what it is.
> Scripts
> >>> (which are meant to be run more than once, generally) usually shouldn't
> >>> really be calling install.packages in the first place, but if they do,
> they
> >>> should certainly not be installing umbrella packages and the packages
> they
> >>> bring with them separately.
> >>>
> >>> Even having one vectorized call to install.packages where all the
> packages
> >>> are installed would prevent this issue, including in the case where the
> >>> user doesn't understand the purpose of the tidyverse package. Though
> the
> >>> installation would still occur every time the script was run.
> >>>
> >>>
> >>> The last thing to note is that there are at least 2 packages which
> provide
> >>> a function which does this already (install.load and remotes), so
> people
> >>> can get this functionality if they need it.
> >>>
> >>>
> >>> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <jgbradley1 using gmail.com>
> wrote:
> >>>
> >>>>
> >>>>
> >>>> I assumed this list is used to discuss proposals like this to the R
> >>>> codebase. If I'm on the wrong list, please let me know.
> >>>>
> >>>
> >>> This is the right place to discuss things like this. Thanks for
> starting
> >>> the conversation.
> >>>
> >>> Best,
> >>> ~G
> >>>
> >>>>
> >>>>
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-devel using r-project.org mailing list
> >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e=
> >>>
> >>
> >> --
> >> Hervé Pagès
> >>
> >> Program in Computational Biology
> >> Division of Public Health Sciences
> >> Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N, M1-B514
> >> P.O. Box 19024
> >> Seattle, WA 98109-1024
> >>
> >> E-mail: hpages using fredhutch.org
> >> Phone:  (206) 667-5791
> >> Fax:    (206) 667-1319
> >> ______________________________________________
> >> R-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list