[R-pkg-devel] Package builds, installs, and runs but does not pass devtools::check()

Fri Jul 20 00:39:50 CEST 2018

Very nice discussion.  Thanks, Mark.

On Thu, Jul 19, 2018 at 3:20 AM, Mark van der Loo
<mark.vanderloo using gmail.com> wrote:
>
> Dear Mike, et al,
>
> My remarks are not necessarily related to tidyverse packages. The main point
> is that there are various purposes and business cases for writing code, and
> they may imply different trade-offs. Let me illustrate with some examples. I
> will focus on non-standard evaluation and dependencies.
>
>
> TL;DR version: (and this is my opinion, nobody has to agree).
>
> 1/Interactive use: user-level NSE ok (as in the not-a-pipe operator, dplyr
> verbs), use any package you want.
> 2/Applications & local packages: avoid NSE within functions, package an
> application with dependencies you need, write code with maintainers in mind.
> 3/Published R-packages: avoid NSE within functions, minimize dependencies to
> what you cannot avoid.
>
> Do Read version:
>
> 1/ One-off data analyses or exploratory data analyses. There are cases where
> you don't need to guarantee that your code will run a few years from now:
> you are the only user and once your task is done, you quickly need to move
> on to the next. Especially in EDA, I write a lot of code that is nice to
> keep in a structured project folder but most probably: 1) I will be its only
> user and 2) I will use it only for this one small project so maintenance is
> not an issue. Although I'm writing code in scripts, it is very close to
> interactive work on the command-line.
>
> In such cases I use whatever gets the job done, including dplyr, tidyr,
> ggplot2, data.table, you name it. Here I basically don't care about
> dependencies and if I write functions there are usually not many of them.
>
>
> 2/ Writing applications or packages for internal use. When you write an
> application you are usually committing to a longer maintenance horizon and
> more than one user. Good chance that you're not the user and also good
> chance you're not the only developer. There are many implications to this
> but since you need to maintain things for a longer term, dependencies can
> become a liability. Fortunately, there are techniques to contain
> dependencies, for example using packrat or by manually setting up a library
> containing the packages your application depends on. You can even use a
> docker instance. I have worked with custom libraries on several occasions.
> Since you (or someone else) is going to maintain the application, it is
> worth while to sit down and think what is the best way to set up code so it
> remains maintainable. This includes questions like: can I easily understand
> what happens when reading it? What expertise does the maintainer need to
> understand it? Non-standard evaluation is generally much harder to reason
> about than standard evaluated code. This makes debugging and extending code
> harder in general.
>
> Now some people will argue that something like filter(data, x>1) is easier
> to understand than data[data$x > 1,,drop=FALSE]. I agree that on a very
> shallow level, filter(data, x>1) is easy to follow, in the sense of  "oh the
> author probably wants to filter something here". But when you are debugging,
> you need to understand in much greater detail what happens: you need to know
> that 'x>1' is an expression, that will be evaluated in the context of
> 'data'. You need to know about environments and parent environments and so
> on. All this knowledge can be avoided with data[data$x > 1,,drop=FALSE]. The
> latter also requires knowledge, but the concepts are much simple I think.
>
> Hence, I tend to avoid NSE when writing applications, although there may
> still be good reasons to do it. Dependencies can be containered in various
> ways so they are not such a big problem.
>
> 3/ Writing packages for CRAN. Now you are committing to long-term
> maintenance, and usage by interactive users, application builders, and
> possibly other package builders. Now a dependency becomes a direct liability
> in the sense that the author of your dependency can change interfaces and
> ask you to comply to the new version. Also, and especially because of
> recursive dependencies, importing a package may give you a whole tail of
> dependencies. This increases load time but also install-time, especially on
> systems where you need to install from source. Light-weight packages
> therefore have real advantages in applications that run many times (like a
> standalone script that is fired by users of a web-application or scripts
> that are scheduled to run in high frequency). It is also worth mentioning
> that an Imports or Depends puts a burden on the maintainer of the package
> you depend on: before submitting to CRAN, a pkg developer needs to check
> against all reverse dependencies (preferably recursively).
>
> So now, it is even more worth while to sit down and think about what is the
> best way to set up your code. Well thought out code can be a pleasure to
> maintain. Code that is hastily put together is a nightmare.
>
> My philosophy is as follows: I depend other packages only when they offer
> something that I cannot fairly trivially do myself. This may have to do with
> a statistical or numerical method I do not want or cannot implement, or it
> can have something to do with performance for example. This does indeed
> exclude much of the tidyverse almost automatically. Many tools in tidyverse
> make already existing functionality easier for (interactive) use. But since
> much of the functionality is already present in base R, and because I find
> NSE hard to reason about in a programming context I have until now not used
> any tidyverse packages as an Imports or Depends.
>
>
> Hope this helps,
> Best,
> Mark
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Op di 17 jul. 2018 om 23:10 schreef Michael Hannon
> <jmhannon.ucdavis using gmail.com>:
>>
>> Thanks, Mark.  Your points are well-taken, but I wouldn't refer to
>> this as a "small side-track".  You don't say so, but this could be
>> interpreted as a recommendation to avoid some or all of the
>> "tidyverse" in developing packages.  I'm actually quite comfortable
>> doing the base-R-style programming you recommend.  I've lately being
>> trying to make a point of using the "tidy" stuff, as that's what I'm
>> seeing almost exclusively from folks in my neighborhood these days.
>> ("Resistance is few-tile...")
>>
>> Also, it would seem to be a corollary that if the ultimate goal is to
>> make a package, then one shouldn't be using the convenience stuff
>> (pipes, dplyr, etc., etc.), even during the development stages.  Can
>> you comment?  Thanks.
>>
>> -- Mike
>>
>>
>> On Tue, Jul 17, 2018 at 2:53 AM, Mark van der Loo
>> <mark.vanderloo using gmail.com> wrote:
>> > Michael,
>> >
>> > Just a small side-track here. I would avoid using the not-a-pipe
>> > operator
>> > within functions or packages in general. It is great for interactive
>> > use,
>> > but it does make debugging and hence long-term maintenance of functions
>> > harder. There are two reasons for this. First, it hides intermediate
>> > results, and second, it adds several layers to the call stack making the
>> > output of functions like traceback() harder to interpret. I have
>> > documented
>> > a simple example here: https://github.com/chriscardillo/norris/issues/1
>> > (scroll down a bit).
>> >
>> > Regarding learning about quosures and so on. If the literal names of
>> > data
>> > frames are known, you could consider replacing
>> >
>> > some_var <-   next_data_frame %>% dplyr::select(-amount,...
>> >
>> > with something simpler like
>> >
>> > some_var <- next_data_frame[ names(next_data_frame) != c("amount", ... )
>> > ]
>> >
>> > which might also save you some dependencies.
>> >
>> >
>> >
>> >
>> > Hope this helps,
>> > Best,
>> > Mark
>> >
>> >
>> >
>> > Op di 17 jul. 2018 om 11:28 schreef Michael Hannon
>> > <jmhannon.ucdavis using gmail.com>:
>> >>
>> >> Thanks to John and Zhian for their recent and informative comments.
>> >>
>> >> Regarding check() and NSE: the moral seems to be that a little
>> >> learning is a dangerous thing.  I'm off to try to bring quosure to
>> >> this issue.
>> >>
>> >> -- Mike
>> >>
>> >>
>> >> On Mon, Jul 16, 2018 at 2:38 PM, Zhian Kamvar <zkamvar using gmail.com>
>> >> wrote:
>> >> > Using dplyr like that is for exploratory data analysis. You'll want
>> >> > to
>> >> > refer
>> >> > to dplyr's "Programming with dplyr" vignette for using dplyr in a
>> >> > package:
>> >> >
>> >> >
>> >> > https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
>> >> >
>> >> > Hope that helps.
>> >> >
>> >> > On Jul 16, 2018, at 22:13 , Michael Hannon
>> >> > <jmhannon.ucdavis using gmail.com>
>> >> > wrote:
>> >> >
>> >> > Thanks, Georgi.  I've changed my approach and now do what I gather is
>> >> > recommended practice: put all external package names into the
>> >> > "Imports" section of the DESCRIPTION file and then use the
>> >> > fully-qualified names for functions from those packages, as:
>> >> >
>> >> >    dplyr::select()
>> >> >
>> >> > The "check" operation is still not entirely "happy" with me, but it
>> >> > doesn't flag any errors, and the package builds and runs.
>> >> >
>> >> > BTW, one source of "complaints" from "check()" is evidently the use
>> >> > of
>> >> > NSE in the tidyverse functions.  For instance, the line:
>> >> >
>> >> >    next_data_frame %>% dplyr::select(-amount,
>> >> >
>> >> > generates the message:
>> >> >
>> >> >    standardize_format: no visible binding for global variable
>> >> > ‘amount’
>> >> >
>> >> > where, of course, "amount" is one of the column headings in
>> >> > "next_data_frame".  There seems to be no harm done by this, and I
>> >> > plan
>> >> > to ignore such messages, but if there's some additional wisdom that
>> >> > applies here, I'd be happy to receive it.
>> >> >
>> >> > -- Mike
>> >> >
>> >> >
>> >> > On Sun, Jul 15, 2018 at 12:05 AM, Georgi Boshnakov
>> >> > <georgi.boshnakov using manchester.ac.uk> wrote:
>> >> >
>> >> >
>> >> > It seems that the R session used by 'check' doesn't look in the
>> >> > library
>> >> > used
>> >> > by your interactive session. This discrepancy may happen since the
>> >> > check
>> >> > tools do not load the same Renviron files as interactive sessions.
>> >> > This
>> >> > may
>> >> > result in different libraries in interactive and 'check' sessions.
>> >> > See
>> >> > ?Startup, especially section Note.
>> >> > It is difficult to give more specific advice without details of your
>> >> > setup.
>> >> >
>> >> >
>> >> > Hope this helps,
>> >> > Georgi Boshnakov
>> >> >
>> >> >
>> >> > ________________________________________
>> >> > From: R-package-devel [r-package-devel-bounces using r-project.org] on
>> >> > behalf
>> >> > of
>> >> > Michael Hannon [jmhannon.ucdavis using gmail.com]
>> >> > Sent: 15 July 2018 02:13
>> >> > To: r-package-devel using r-project.org
>> >> > Subject: [R-pkg-devel] Package builds, installs, and runs but does
>> >> > not
>> >> > pass
>> >> > devtools::check()
>> >> >
>> >> > Greetings.  I'm working on a small package, and I'm using the
>> >> > devtools
>> >> > functions to create, build, etc., the package.
>> >> >
>> >> > As indicated in the subject line, I get no errors when I do:
>> >> >
>> >> > build()
>> >> > install()
>> >> >
>> >> >
>> >> > When I run a separate R session and load the package, i.e.,
>> >> >
>> >> > library(my_pkg)
>> >> >
>> >> >
>> >> > the package loads without error, and the two exported functions
>> >> > appear
>> >> > to work as advertised.
>> >> >
>> >> > OTOH, if I include devtools::check() in the construction of the
>> >> > package, I consistently get an error:
>> >> >
>> >> >    * installing *source* package ‘my_pkg’ ...
>> >> >    ** R
>> >> >    ** preparing package for lazy loading
>> >> >    Error in loadNamespace(from, lib.loc = .library) :
>> >> >      there is no package called ‘dplyr’
>> >> >    Error : unable to load R code in package 'my_pkg'
>> >> >
>> >> > Clearly there *is* a package called "dplyr" on my system (see the
>> >> > session info below, for instance).  And, as I've mentioned, the code
>> >> > *does* run, and I can watch it successfully reading CSV files.
>> >> >
>> >> > Here's the relevant part of my DESCRIPTION file:
>> >> >
>> >> >    Depends: R (>= 3.4.4)
>> >> >    Imports: readr,
>> >> >            dplyr,
>> >> >            ggplot2,
>> >> >            purrr,
>> >> >            magrittr
>> >> >
>> >> > I suspect the problem may be that I'm misunderstanding something
>> >> > about
>> >> > the `import::from()` function, which I'm using for the first time to
>> >> > load required functions into my code.  In each of the three files
>> >> > that
>> >> > use dplyr I have the line:
>> >> >
>> >> >    import::from(dplyr, mutate, filter, rename, select, setdiff,
>> >> > slice,
>> >> > "%>%")
>> >> >
>> >> > I've tried:
>> >> >
>> >> >    (1) putting that line in just one of the files (the lexically
>> >> > first
>> >> > one)
>> >> >    (2) including different subsets of dplyr functions, as needed, in
>> >> > the various files
>> >> >
>> >> > Needless to say, I haven't seen any improvement with any of the above
>> >> > (or any of the other thrashing I've done).
>> >> >
>> >> > If you can point me in the right direction, I'd appreciate it.
>> >> > Thanks.
>> >> >
>> >> > -- Mike
>> >> >
>> >> >
>> >> > session_info()
>> >> >
>> >> > Session info
>> >> > ------------------------------------------------------------------
>> >> > setting  value
>> >> > version  R version 3.4.4 (2018-03-15)
>> >> > system   x86_64, linux-gnu
>> >> > ui       X11
>> >> > language en_US
>> >> > collate  en_US.UTF-8
>> >> > tz       America/Los_Angeles
>> >> > date     2018-07-14
>> >> >
>> >> > Packages
>> >> >
>> >> > ----------------------------------------------------------------------
>> >> > package    * version date       source
>> >> > assertthat   0.2.0   2017-04-11 CRAN (R 3.3.3)
>> >> > base       * 3.4.4   2018-03-16 local
>> >> > bindr        0.1.1   2018-03-13 CRAN (R 3.4.3)
>> >> > bindrcpp     0.2.2   2018-03-29 CRAN (R 3.4.4)
>> >> > compiler     3.4.4   2018-03-16 local
>> >> > crayon       1.3.4   2017-09-16 CRAN (R 3.4.1)
>> >> > datasets   * 3.4.4   2018-03-16 local
>> >> > devtools   * 1.13.6  2018-06-27 CRAN (R 3.4.4)
>> >> > digest       0.6.15  2018-01-28 CRAN (R 3.4.3)
>> >> > dplyr      * 0.7.6   2018-06-29 CRAN (R 3.4.4)
>> >> > glue         1.2.0   2017-10-29 CRAN (R 3.4.2)
>> >> > graphics   * 3.4.4   2018-03-16 local
>> >> > grDevices  * 3.4.4   2018-03-16 local
>> >> > magrittr     1.5     2014-11-22 CRAN (R 3.2.2)
>> >> > memoise      1.1.0   2017-04-21 CRAN (R 3.3.3)
>> >> > methods    * 3.4.4   2018-03-16 local
>> >> > pillar       1.3.0   2018-07-14 CRAN (R 3.4.4)
>> >> > pkgconfig    2.0.1   2017-03-21 CRAN (R 3.4.0)
>> >> > purrr        0.2.5   2018-05-29 CRAN (R 3.4.4)
>> >> > R6           2.2.2   2017-06-17 CRAN (R 3.4.0)
>> >> > Rcpp         0.12.17 2018-05-18 CRAN (R 3.4.4)
>> >> > rlang        0.2.1   2018-05-30 CRAN (R 3.4.4)
>> >> > stats      * 3.4.4   2018-03-16 local
>> >> > tibble       1.4.2   2018-01-22 CRAN (R 3.4.3)
>> >> > tidyselect   0.2.4   2018-02-26 CRAN (R 3.4.3)
>> >> > utils      * 3.4.4   2018-03-16 local
>> >> > withr        2.1.2   2018-03-15 CRAN (R 3.4.3)
>> >> >
>> >> >
>> >> >
>> >> > ______________________________________________
>> >> > R-package-devel using r-project.org mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> >> >
>> >> >
>> >> > ______________________________________________
>> >> > R-package-devel using r-project.org mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> >> >
>> >> >
>> >>
>> >> ______________________________________________
>> >> R-package-devel using r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel