[R-pkg-devel] using package data in package code

Fri Jun 7 01:21:05 CEST 2019

On 06/06/2019 6:29 p.m., Linus Chen wrote:
> Dear all,
> 
> I am trying to understand the usage of package data.
> Suppose I create an object and save it
>      x <- 3
>      save(x, 'sysdata.rda')
> and then put it into mypac/data/ directoy.
> And suppose in my packge code I have
>      f <- function(x) x+1
> Then, when the package has been built, installed, and loaded,
> the user get access to mypac::f and mypac::x, and
>      y <- mypac::f(mypac::x)
> will give 4.
> 
> However, if I put the same code into the package, with the purpose to
> create an object mypac::y, the package would not build, with the error
> message: object 'x' not found.
> 
> So my question is: Is there a mechanism for the package data to be accessed by
> the package code?
> I have tried to look for answer in "Writing R Extensions",
> and especially, I have read section 1.1.6 "Data in packages" carefully,
> but no answer for my question...
> 
> In my real package, certainly x and f is non-trivial. Basically x is
> kind of a look-up table.
> The 'y' that I wish to provide with the package to the user is a S4 object.
> And 'f', the function am using to create 'y', is itself a non-trivial
> function defined in the package.
> 
> The work-around I can figure out are the following two:
> 1. put 'x' in a separate package, which 'mypac' will import.
>      Then the package code "y <- f(x)".
>      But it feels bad to have two packages for one integral set of information.
> 
> 2. Do the whole thing in two pass, in the first round the package
> contain only x anf f.
>      Then, load the package, run y<-f(x), and add it to 'sysdata.rda'.
>      But this way, the two-pass operation has to be repeated each time
> the package is modified,
>      even if the change does not affect 'x', and only affect 'f'.
> 
> So is there a more elegant way to deal with this?
> 
> I would be grateful to any hints, or useful links. Thank you!

There are lots of ways to do this, depending on your needs.  The 
simplest is just to include

   x <- 3

in one of your .R files in your package; then all functions in your 
package can see it, but users of your package can't (unless you export it).

If x is really huge, you might think this will waste memory every time 
your package is loaded.  However, objects in packages can be "lazy 
loaded", so the memory will only be used if x is ever referenced.  This 
is a little tricky for datasets, though it is the default if the object 
is a function.

For data, the sysdata.rda file is the easiest solution.  It's for data 
that is used by the functions in the package, not user-visible.  Just 
put x into that file in the package/R directory to make use of it.

You don't need a two-pass solution.  As long as the sysdata.rda file 
stays unmodified, x will be available for code in the package to use as 
it sees fit, without any special handling.  (If x is created from 
complicated source code that changes regularly, I'd go with one of my 
earlier suggestions.)

Duncan Murdoch