[R-pkg-devel] Examples taking too long depend on object that takes a while to generate

Thu Sep 15 16:21:48 CEST 2022

>>>>> John Harrold 
>>>>>     on Thu, 15 Sep 2022 05:11:15 -0700 writes:

    > Not to be pedantic but it's not a dataset per-se. It's an R object that the
    > examples need.

Yes.
(and see below)

    > On Thu, Sep 15, 2022 at 2:49 AM Duncan Murdoch <murdoch.duncan using gmail.com>
    > wrote:

    >> On 15/09/2022 5:29 a.m., Martin Maechler wrote:
    >> >>>>>> Duncan Murdoch
    >> >>>>>>      on Thu, 15 Sep 2022 04:42:04 -0400 writes:
    >> >
    >> >      > On 15/09/2022 3:45 a.m., Martin Maechler wrote:
    >> >      >>>>>>> Duncan Murdoch
    >> >      >>>>>>> on Wed, 14 Sep 2022 13:02:28 -0400 writes:
    >> >      >>
    >> >      >> > On 14/09/2022 12:43 p.m., Ivan Krylov wrote:
    >> >      >> >> On Wed, 14 Sep 2022 12:31:49 -0400
    >> >      >> >> Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
    >> >      >> >>
    >> >      >> >>> It's also possible to put .R files in the data directory, and they
    >> >      >> >>> are executed to create the data object.  I think that happens at the
    >> >      >> >>> time when you call data() rather than at install time, so it might
    >> >      >> >>> not be helpful.
    >> >      >> >>
    >> >      >> >> Some time ago I was hoping to compress a package of mine by generating a
    >> >      >> >> dataset during a data() call instead loading it from an .rda file, but
    >> >      >> >> it turned out that the .R file is executed during R CMD build:
    >> >      >> >>
    >> https://github.com/r-devel/r-svn/blob/03df313ad37456c6a62158328d4e373408ce4d59/src/library/tools/R/build.R#L794
    >> >      >>
    >> >      >> > Thanks for that info.  That's not good for John, because the
    >> >      >> > architecture isn't known at build time.
    >> >      >>
    >> >      >> > Duncan Murdoch
    >> >      >>
    >> >      >> Sorry to muddy the water, but what *is* "build time"?
    >> >      >> There's the big difference between building
    >> >      >> 1) a  Source tarball                    and
    >> >      >> 2) a  MacOS or Windows binary package
    >> >      >>
    >> >      >> Unfortunately, the two situations are very different notably in
    >> >      >> this case, where '(2)' is really much closer to the
    >> >      >> "install time" you mention.
    >> >      >>
    >> >
    >> >      > I meant building the tarball, and assumed that was what Ivan was talking
    >> >      > about as well.
    >> >
    >> >      > Duncan Murdoch
    >> >
    >> > Ok, thank you, for the clarification.
    >> >
    >> > Note that  `R CMD build --help`  mentions (among more)
    >> >
    >> >    --resave-data=        re-save data files as compactly as possible:
    >> >                          "no", "best", "gzip" (default)
    >> >    --no-resave-data      same as --resave-data=no
    >> >
    >> > so when building the package,
    >> > Ivan should get what he wanted with
    >> >
    >> >      R CMD build --no-resave-data  <pkg>
    >> >
    >> > no ?
    >> 
    >> It's actually John Harrold who has the problem:  a dataset that he wants
    >> to use in examples that takes a long time to build, causing his examples
    >> to exceed the CRAN 5 second limit.
    >> 
    >> So what I was suggesting is that he should arrange for it to be created
    >> before running the example; the problem is that the dataset depends on
    >> the architecture of the machine that's running the example.  To follow
    >> my suggestion he would need to have the dataset created when the package
    >> was installed (or the binary was built).

and now John can use the data() "trick" Ivan recommended *if*
John builds his source tar ball using
   R CMD build --no-resave-data

*and* puts the code that creates the R object into a  <his_pkg>/data/myRobj.R  file
*and* adds a

      data(myRobj)

into his .onLoad() hook.

This would *not* create the object at package installation time
but each time the package is loaded .. which is at least only
once for all the examples in  R CMD check <his_pkg>

OTOH, I do agree with you Duncan, that in this case your
suggestion seems preferable and we add the R code which creates
the object somewhere "nakedly" in <his_pkg>/R/zzz.R  {zzz: to be alphabetically last}
and hence the object would live (hidden) in his_pkg namespace.

He'd ideally also provide an *exported* function, say

    myData <- function(name = "<name_of_his_object>")  get(name, asNamespace("<his_pkg>"))

and then would use  myData()  {or myData(".....") } in his
package examples.