[R] Issue with dataset inclusion in CRAN packages

csrabak crabak at acm.org
Sun Jun 26 23:18:24 CEST 2011


Em 26/6/2011 17:43, Frank Harrell escreveu:
> I was glad to see the new rpart.plot package by Stephen Milborrow.  I was
> however a bit concerned that Stephen distributed a dataset I created, and
> renamed the dataset (from titanic3 to ptitanic) in the process [with some
> justification, as some variables were omitted].  Fortunately Stephen
> included the script he used to download the dataset from our web site, and
> gave full credit to us.  What concerns me is that the rpart.plot package
> does not contain many functions but the package is as large as packages
> containing hundreds of functions.  This is due to the inclusion of the
> dataset.  I would prefer that authors provide the URL so that users can
> easily install the binary R binary dataframe directly from our web site (we
> even have an automated way to do this: require(Hmisc); getHdata(titanic3)).
> This will allow users to profit from possible future data corrections as
> well as making the package much more compact.  Thanks for listening.  I'm
> writing to r-help because this may applied to other R packages as well.
>
Frank,

I can understand your concern and at first thought would even second it.

On the other hand, I think there are reasonable explanations why all 
authors prefer to include the datasets, especially if the data will be 
used in examples:

1) Docs written based in the datasets are synced with the dataframes 
offered with the package;

2) In several environments access to the web may be restricted and the 
getHdata or read.table("<url>") be not allowed.

my 0.019999...

Regards,

--
Cesar Rabak



More information about the R-help mailing list