[R-pkg-devel] Writing a data package with large files

Dirk Eddelbuettel edd @end|ng |rom deb|@n@org
Sat Jul 6 16:10:48 CEST 2019

On 6 July 2019 at 09:27, Alex Hallam wrote:
| I have been working on making a data package. The goal is to one day push
| it to CRAN, but I am having 2 problems (one warning and one note) from R
| CMD. I think the problems are due to having large files (a 453M csv.7z raw
| file and a 75M .rda file)

Well that is in excess of what the manual ("Writing R Extensions") and the
CRAN Repo Policy allow.

So I suggest you think about other ways for users to get your data. Brooke
and I once described a particular approach here:


and I include the main gist / abstract:

  Hosting Data Packages via drat: A Case Study with Hurricane Exposure Data  
  G. Brooke Anderson and Dirk Eddelbuettel
  The R Journal (2017) 9:1, pages 486-497.

  Abstract Data-only packages offer a way to provide extended functionality
  for other R users. However, such packages can be large enough to exceed the
  package size limit (5 megabytes) for the Comprehen sive R Archive Network
  (CRAN). As an alternative, large data packages can be posted to additional
  repostiories beyond CRAN itself in a way that allows smaller code packages
  on CRAN to access and use the data. The drat package facilitates creation
  and use of such alternative repositories and makes it particularly simple
  to host them via GitHub. CRAN packages can draw on packages posted to drat
  repositories through the use of the ‘Additonal_repositories’ field in the
  DESCRIPTION file. This paper describes how R users can create a suite of
  coordinated packages, in which larger data packages are hosted in an
  alternative repository created with drat, while a smaller code package that
  interacts with this data is created that can be submitted to CRAN.

Hth, Dirk

http://dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org

More information about the R-package-devel mailing list