[R-pkg-devel] Large Data Package CRAN Preferences
Uwe Ligges
||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de
Sun Dec 15 17:54:11 CET 2019
Ideally yoiu wpuld host the data elsewhere and submit a CRAN package
that allows users to easily get/merge/aggregate the data.
Best,
Uwe Ligges
On 12.12.2019 20:55, bill using denney.ws wrote:
> Hello,
>
>
>
> I have two questions about creating data packages for data that will be
> updated and in total are >5 MB in size.
>
>
>
> The first question is:
>
>
>
> In the CRAN policy, it indicates that packages should be ?5 MB in size in
> general. Within a package that I'm working on, I need access to data that
> are updated approximately quarterly, including the historical datasets
> (specifically, these are the SDTM and CDASH terminologies in
> https://evs.nci.nih.gov/ftp1/CDISC/SDTM/Archive/).
>
>
>
> Current individual data updates are approximately 1 MB when individually
> saved as .RDS, and the total current set is about 20 MB. I think that the
> preferred way to generate these packages since there will be future updates
> is to generate one data package for each update and then have an umbrella
> package that will depend on each of the individual data update packages.
> That seems like it will minimize space requirements on CRAN since old data
> will probably never need to be updated (though I will need to access it).
>
>
>
> Is that an accurate summary of the best practice for creating these as a
> data package?
>
>
>
> And a second question is:
>
>
>
> Assuming the best practice is the one I described above, the typical need
> will be to combine the individual historical datasets for local use. An
> initial test of the time to combine the data indicates that it would take
> about 1 minute to do, but after combination, the result could be loaded
> faster. I'd like to store the combined dataset locally with the umbrella
> package. I believe that it is considered poor form to write within the
> library location for a package except during installation.
>
>
>
> What is the best practice for caching the resulting large dataset which is
> locally-generated?
>
>
>
> Thanks,
>
>
>
> Bill
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
More information about the R-package-devel
mailing list