[R-pkg-devel] Best practices for distributing large data files

Neal Fultz n|u|tz @end|ng |rom gm@||@com
Wed Feb 16 05:10:17 CET 2022


I host my clients' packages on aws; the cost is minimal, and extremely fast
for installing on other systems on amazon. Here's the script I use:

https://github.com/njnmco/njnmverse/blob/master/Makefile



On Tue, Feb 15, 2022 at 6:55 PM Ayala Hernandez, Rafael <
r.ayala14 using imperial.ac.uk> wrote:

> Dear all,
>
> I am currently trying to think of the best way to distribute large sets of
> coefficients required by my package asteRisk.
>
> At the moment, I am using an accessory data package, asteRiskData,
> available from a drat repository, that bundles all of the required
> coefficients already parsed and stored as R objects.
>
> However, as my package grows, the amount of data required is also growing.
> This has made the size of asteRiskData grow larger, reaching 99.99 MB at
> the moment, which is at the limit of what would be upload able to GitHub.
> Since the source package must be uploaded a a single .tar.gz file for the
> drat repository, I see no easy workaround, other than splitting it into
> multiple, accessory data packages.
>
> I believe this option could become rather troublesome in the future, if
> the number of accessory data packages starts to grow too much.
>
> So I would like to ask, is there any recommended procedure for
> distributing such large data files?
>
> Another option that has been suggested to me is not to use an accessory
> data package at all, but instead download and parse the required data on
> demand from the corresponding internet resources, store them locally, and
> then have future sessions load them from the local copies, therefore not
> requiring download and parsing in every R session, but only once (or
> possibly only once in a while, if the associated resource is updated).
> However, this would be leaving files of relatively large size (several 10s
> of MB) scattered in the local environment of users (instead of having them
> all centralized in the accessory data package). Is this option acceptable
> as well?
>
> Thanks a lot in advance for any insights
>
> Best wishes,
>
> Rafa
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list