[R-pkg-devel] Large Data Package CRAN Preferences
biii m@iii@g oii de@@ey@ws
biii m@iii@g oii de@@ey@ws
Sun Dec 15 18:27:09 CET 2019
Hi Uwe,
Thanks for this information, and it makes sense to me. Is there a preferred way to cache the data locally?
None of the ways that I can think to cache the data sound particularly good, and I wonder if I'm missing something. The ideas that occur to me are:
1. Download them into the package directory `path.package("datapkg")`, but that would require an action to be performed on package installation, and I'm unaware of any way to trigger an action on installation.
2. Have a user-specified cache directory (e.g. `options("datapkg_cache"="/my/cache/location")`), but that would require interaction with every use. (Not horrible, but it will likely significantly increase the number of user issues with the package.)
3. Have a user-specified cache directory like #2, but have it default to somewhere in their home directory like `file.path(Sys.getenv("HOME"), "datapkg_cache")` if they have not set the option.
To me #3 sounds best, but I'd like to be sure that I'm not missing something.
Thanks,
Bill
-----Original Message-----
From: Uwe Ligges <ligges using statistik.tu-dortmund.de>
Sent: Sunday, December 15, 2019 11:54 AM
To: bill using denney.ws; r-package-devel using r-project.org
Subject: Re: [R-pkg-devel] Large Data Package CRAN Preferences
Ideally yoiu wpuld host the data elsewhere and submit a CRAN package that allows users to easily get/merge/aggregate the data.
Best,
Uwe Ligges
On 12.12.2019 20:55, bill using denney.ws wrote:
> Hello,
>
>
>
> I have two questions about creating data packages for data that will
> be updated and in total are >5 MB in size.
>
>
>
> The first question is:
>
>
>
> In the CRAN policy, it indicates that packages should be ?5 MB in size
> in general. Within a package that I'm working on, I need access to
> data that are updated approximately quarterly, including the
> historical datasets (specifically, these are the SDTM and CDASH
> terminologies in https://evs.nci.nih.gov/ftp1/CDISC/SDTM/Archive/).
>
>
>
> Current individual data updates are approximately 1 MB when
> individually saved as .RDS, and the total current set is about 20 MB.
> I think that the preferred way to generate these packages since there
> will be future updates is to generate one data package for each update
> and then have an umbrella package that will depend on each of the individual data update packages.
> That seems like it will minimize space requirements on CRAN since old
> data will probably never need to be updated (though I will need to access it).
>
>
>
> Is that an accurate summary of the best practice for creating these as
> a data package?
>
>
>
> And a second question is:
>
>
>
> Assuming the best practice is the one I described above, the typical
> need will be to combine the individual historical datasets for local
> use. An initial test of the time to combine the data indicates that
> it would take about 1 minute to do, but after combination, the result
> could be loaded faster. I'd like to store the combined dataset
> locally with the umbrella package. I believe that it is considered
> poor form to write within the library location for a package except during installation.
>
>
>
> What is the best practice for caching the resulting large dataset
> which is locally-generated?
>
>
>
> Thanks,
>
>
>
> Bill
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
More information about the R-package-devel
mailing list