[R-pkg-devel] Large Data Package CRAN Preferences

biii m@iii@g oii de@@ey@ws biii m@iii@g oii de@@ey@ws
Thu Dec 12 20:55:31 CET 2019


Hello,

 

I have two questions about creating data packages for data that will be
updated and in total are >5 MB in size.

 

The first question is:

 

In the CRAN policy, it indicates that packages should be ?5 MB in size in
general.  Within a package that I'm working on, I need access to data that
are updated approximately quarterly, including the historical datasets
(specifically, these are the SDTM and CDASH terminologies in
https://evs.nci.nih.gov/ftp1/CDISC/SDTM/Archive/).

 

Current individual data updates are approximately 1 MB when individually
saved as .RDS, and the total current set is about 20 MB.  I think that the
preferred way to generate these packages since there will be future updates
is to generate one data package for each update and then have an umbrella
package that will depend on each of the individual data update packages.
That seems like it will minimize space requirements on CRAN since old data
will probably never need to be updated (though I will need to access it).

 

Is that an accurate summary of the best practice for creating these as a
data package?

 

And a second question is:

 

Assuming the best practice is the one I described above, the typical need
will be to combine the individual historical datasets for local use.  An
initial test of the time to combine the data indicates that it would take
about 1 minute to do, but after combination, the result could be loaded
faster.  I'd like to store the combined dataset locally with the umbrella
package.  I believe that it is considered poor form to write within the
library location for a package except during installation.

 

What is the best practice for caching the resulting large dataset which is
locally-generated?

 

Thanks,

 

Bill


	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list