[R-pkg-devel] How to store large data to be used in an R package?

Dirk Eddelbuettel edd @end|ng |rom deb|@n@org
Tue Mar 26 15:35:13 CET 2024


On 25 March 2024 at 11:12, Jairo Hidalgo Migueles wrote:
| I'm reaching out to seek some guidance regarding the storage of relatively
| large data, ranging from 10-40 MB, intended for use within an R package.
| Specifically, this data consists of regression and random forest models
| crucial for making predictions within our R package.
| 
| Initially, I attempted to save these models as internal data within the
| package. While this approach maintains functionality, it has led to a
| package size exceeding 20 MB. I'm concerned that this would complicate
| submitting the package to CRAN in the future.
| 
| I would greatly appreciate any suggestions or insights you may have on
| alternative methods or best practices for efficiently storing and accessing
| this data within our R package.

Brooke and I wrote a paper on one way of addressing it via a 'data' package
accessibly via an Additional_repositories: entry supported by a drat repo.

See https://journal.r-project.org/archive/2017/RJ-2017-026/index.html for the
paper which contains a nice slow walkthrough of all the details.

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org



More information about the R-package-devel mailing list