[R-pkg-devel] Large Data Package CRAN Preferences

Uwe Ligges ||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de
Sun Dec 15 17:54:11 CET 2019


Ideally yoiu wpuld host the data elsewhere and submit a CRAN package 
that allows users to easily get/merge/aggregate the data.

Best,
Uwe Ligges



On 12.12.2019 20:55, bill using denney.ws wrote:
> Hello,
> 
>   
> 
> I have two questions about creating data packages for data that will be
> updated and in total are >5 MB in size.
> 
>   
> 
> The first question is:
> 
>   
> 
> In the CRAN policy, it indicates that packages should be ?5 MB in size in
> general.  Within a package that I'm working on, I need access to data that
> are updated approximately quarterly, including the historical datasets
> (specifically, these are the SDTM and CDASH terminologies in
> https://evs.nci.nih.gov/ftp1/CDISC/SDTM/Archive/).
> 
>   
> 
> Current individual data updates are approximately 1 MB when individually
> saved as .RDS, and the total current set is about 20 MB.  I think that the
> preferred way to generate these packages since there will be future updates
> is to generate one data package for each update and then have an umbrella
> package that will depend on each of the individual data update packages.
> That seems like it will minimize space requirements on CRAN since old data
> will probably never need to be updated (though I will need to access it).
> 
>   
> 
> Is that an accurate summary of the best practice for creating these as a
> data package?
> 
>   
> 
> And a second question is:
> 
>   
> 
> Assuming the best practice is the one I described above, the typical need
> will be to combine the individual historical datasets for local use.  An
> initial test of the time to combine the data indicates that it would take
> about 1 minute to do, but after combination, the result could be loaded
> faster.  I'd like to store the combined dataset locally with the umbrella
> package.  I believe that it is considered poor form to write within the
> library location for a package except during installation.
> 
>   
> 
> What is the best practice for caching the resulting large dataset which is
> locally-generated?
> 
>   
> 
> Thanks,
> 
>   
> 
> Bill
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>



More information about the R-package-devel mailing list