[R-pkg-devel] Examples taking too long depend on object that takes a while to generate

Uwe Ligges ||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de
Thu Sep 15 18:42:52 CEST 2022



On 15.09.2022 16:21, Martin Maechler wrote:
>>>>>> John Harrold
>>>>>>      on Thu, 15 Sep 2022 05:11:15 -0700 writes:
> 
>      > Not to be pedantic but it's not a dataset per-se. It's an R object that the
>      > examples need.
> 
> Yes.
> (and see below)
> 
>      > On Thu, Sep 15, 2022 at 2:49 AM Duncan Murdoch <murdoch.duncan using gmail.com>
>      > wrote:
> 
>      >> On 15/09/2022 5:29 a.m., Martin Maechler wrote:
>      >> >>>>>> Duncan Murdoch
>      >> >>>>>>      on Thu, 15 Sep 2022 04:42:04 -0400 writes:
>      >> >
>      >> >      > On 15/09/2022 3:45 a.m., Martin Maechler wrote:
>      >> >      >>>>>>> Duncan Murdoch
>      >> >      >>>>>>> on Wed, 14 Sep 2022 13:02:28 -0400 writes:
>      >> >      >>
>      >> >      >> > On 14/09/2022 12:43 p.m., Ivan Krylov wrote:
>      >> >      >> >> On Wed, 14 Sep 2022 12:31:49 -0400
>      >> >      >> >> Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>      >> >      >> >>
>      >> >      >> >>> It's also possible to put .R files in the data directory, and they
>      >> >      >> >>> are executed to create the data object.  I think that happens at the
>      >> >      >> >>> time when you call data() rather than at install time, so it might
>      >> >      >> >>> not be helpful.
>      >> >      >> >>
>      >> >      >> >> Some time ago I was hoping to compress a package of mine by generating a
>      >> >      >> >> dataset during a data() call instead loading it from an .rda file, but
>      >> >      >> >> it turned out that the .R file is executed during R CMD build:
>      >> >      >> >>
>      >> https://github.com/r-devel/r-svn/blob/03df313ad37456c6a62158328d4e373408ce4d59/src/library/tools/R/build.R#L794
>      >> >      >>
>      >> >      >> > Thanks for that info.  That's not good for John, because the
>      >> >      >> > architecture isn't known at build time.
>      >> >      >>
>      >> >      >> > Duncan Murdoch
>      >> >      >>
>      >> >      >> Sorry to muddy the water, but what *is* "build time"?
>      >> >      >> There's the big difference between building
>      >> >      >> 1) a  Source tarball                    and
>      >> >      >> 2) a  MacOS or Windows binary package
>      >> >      >>
>      >> >      >> Unfortunately, the two situations are very different notably in
>      >> >      >> this case, where '(2)' is really much closer to the
>      >> >      >> "install time" you mention.
>      >> >      >>
>      >> >
>      >> >      > I meant building the tarball, and assumed that was what Ivan was talking
>      >> >      > about as well.
>      >> >
>      >> >      > Duncan Murdoch
>      >> >
>      >> > Ok, thank you, for the clarification.
>      >> >
>      >> > Note that  `R CMD build --help`  mentions (among more)
>      >> >
>      >> >    --resave-data=        re-save data files as compactly as possible:
>      >> >                          "no", "best", "gzip" (default)
>      >> >    --no-resave-data      same as --resave-data=no
>      >> >
>      >> > so when building the package,
>      >> > Ivan should get what he wanted with
>      >> >
>      >> >      R CMD build --no-resave-data  <pkg>
>      >> >
>      >> > no ?
>      >>
>      >> It's actually John Harrold who has the problem:  a dataset that he wants
>      >> to use in examples that takes a long time to build, causing his examples
>      >> to exceed the CRAN 5 second limit.
>      >>
>      >> So what I was suggesting is that he should arrange for it to be created
>      >> before running the example; the problem is that the dataset depends on
>      >> the architecture of the machine that's running the example.  To follow
>      >> my suggestion he would need to have the dataset created when the package
>      >> was installed (or the binary was built).
> 
> and now John can use the data() "trick" Ivan recommended *if*
> John builds his source tar ball using
>     R CMD build --no-resave-data
> 
> *and* puts the code that creates the R object into a  <his_pkg>/data/myRobj.R  file
> *and* adds a
> 
>        data(myRobj)
> 
> into his .onLoad() hook.

Hmmm, not good: A package is loaded several times during R CMD check.
And the user has to wait each time the package is loaded.

> 
> This would *not* create the object at package installation time
> but each time the package is loaded .. which is at least only
> once for all the examples in  R CMD check <his_pkg>
> 
> OTOH, I do agree with you Duncan, that in this case your

indeed

> suggestion seems preferable and we add the R code which creates
> the object somewhere "nakedly" in <his_pkg>/R/zzz.R  {zzz: to be alphabetically last}
> and hence the object would live (hidden) in his_pkg namespace.

I guess zzz.R can also be used to save it as an RData file to the data 
dir (untested) and remove the object from the current environment 
instantly again?

Best,
Uwe Ligges



> He'd ideally also provide an *exported* function, say
> 
>      myData <- function(name = "<name_of_his_object>")  get(name, asNamespace("<his_pkg>"))
> 
> and then would use  myData()  {or myData(".....") } in his
> package examples.
> 
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



More information about the R-package-devel mailing list