[R-pkg-devel] Writing a data package with large files
Dirk Eddelbuettel
edd @end|ng |rom deb|@n@org
Sat Jul 6 16:10:48 CEST 2019
On 6 July 2019 at 09:27, Alex Hallam wrote:
| I have been working on making a data package. The goal is to one day push
| it to CRAN, but I am having 2 problems (one warning and one note) from R
| CMD. I think the problems are due to having large files (a 453M csv.7z raw
| file and a 75M .rda file)
Well that is in excess of what the manual ("Writing R Extensions") and the
CRAN Repo Policy allow.
So I suggest you think about other ways for users to get your data. Brooke
and I once described a particular approach here:
https://journal.r-project.org/archive/2017/RJ-2017-026/index.html
and I include the main gist / abstract:
Hosting Data Packages via drat: A Case Study with Hurricane Exposure Data
G. Brooke Anderson and Dirk Eddelbuettel
The R Journal (2017) 9:1, pages 486-497.
Abstract Data-only packages offer a way to provide extended functionality
for other R users. However, such packages can be large enough to exceed the
package size limit (5 megabytes) for the Comprehen sive R Archive Network
(CRAN). As an alternative, large data packages can be posted to additional
repostiories beyond CRAN itself in a way that allows smaller code packages
on CRAN to access and use the data. The drat package facilitates creation
and use of such alternative repositories and makes it particularly simple
to host them via GitHub. CRAN packages can draw on packages posted to drat
repositories through the use of the ‘Additonal_repositories’ field in the
DESCRIPTION file. This paper describes how R users can create a suite of
coordinated packages, in which larger data packages are hosted in an
alternative repository created with drat, while a smaller code package that
interacts with this data is created that can be submitted to CRAN.
Hth, Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org
More information about the R-package-devel
mailing list