[R-pkg-devel] Retrieving versioned csv datasets for use in an R package
Jan van der Laan
rhe|p @end|ng |rom eoo@@dd@@n|
Fri Feb 14 17:10:58 CET 2025
Not an answer, but a request from someone often working behind firewalls
and/or machines not connected to the internet. Please have a way to have
the package search for the data at some user specified location such as
a local directory.
Best,
Jan
On 14-02-2025 15:54, John Clarke wrote:
> Hi folks,
>
> I've looked around for this particular question, but haven't found a good
> answer. I have a versioned dataset that includes about 6 csv files that
> total about 15MB for each version. The versions get updated every few years
> or so and are used to drive the model which was written in C++ but is now
> inside an Rcpp wrapper. Apart from the fact that CRAN does not permit large
> files, I want to have a better way for users to access particular versions
> of the dataset.
>
> Usage idea:
> # The following would hopefully also download default/most recent version
> of the csv files from CRAN (if allowed) or Github or some other repository
> for academic open source data.
> install.packages("MyPackage")
> mypackage = new(MyPackage)
>
> Then, if necessary, the user could change the dataset used with something
> like:
> mypackage.dataset("2.1.0") which would retrieve new csv files if they
> haven't already been downloaded and update the data_folder path internally
> to point to 2.1.0 directory.
>
> Requirements:
> - The dataset is csv (not a R data object) and the Rcpp MyPackage expects
> this format
> - Would be nice to properly include citations for the data as they will
> likely be initially released through a journal publication
>
> What is the best practice for this sort of dataset management for a package
> in R? Is it okay to use Github to store and version the data? Or
> preferred to use an R package (ignoring the file size limit). Or some other
> open source data hosting? I see https://r-universe.dev/ as an option as
> well. In any case, what is the proper mechanism for retrieving/caching the
> data?
>
> Thanks,
>
> -John
>
> John Clarke | Senior Technical Advisor |
> Cornerstone Systems Northwest | john.clarke using cornerstonenw.com
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
More information about the R-package-devel
mailing list