[R] R dataset copyrights

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Apr 25 08:23:38 CEST 2014

On 24/04/2014 22:33, Greg Snow wrote:
> Many, probably even most (but I have not checked) of the datasets
> available in R packages have help files with a references section.
> That section should lead you to an original source that may have the
> copyright information and is what should be referenced.
> My understanding (but I am not a lawyer, do not play one on TV, or
> claim to be any type of legal expert) is that you cannot copyright
> facts, but you can copyright the layout and presentation of facts.  So
> real data about the real world cannot be copyrighted, but the layout
> and presentation can be.  So if you photocopy a page from a journal
> and post that you may be in trouble for copying and distributing the
> layout and presentation of the data, but not the data itself.  But if
> you transform the numbers to a file to be read by the computer then
> you have just copied the facts which are not copyrighted.

You most likely also copied the layout (which numbers/strings are in 
which rows ...).  There are legal precedents involving telephone 
directories, for example.

There was a May 2007 thread about this: see 
https://stat.ethz.ch/pipermail/r-help/2007-May/131780.html and replies.

> On the other hand simulated or otherwise made up datasets could be
> considered to be fiction and therefore able to be copyrighted.  I
> remember hearing (but I don't remember where or when) that some
> textbook authors are encouraged to use simulated data instead of real
> data (it can have the same mean, sd, etc. as a real dataset so the
> interpretation is the same) in textbooks so that the copyright of the
> textbook also applies to the data.  It is not always clear whether a
> dataset is fact or simulated, so it is best to obtain permission or
> check official statements from the source.
> Beyond what is legal you should consider what is right.  Even if you
> don't have to cite a data source, you should try to give credit where
> it is due (and possibly blame if there is an error).  At a minimum you
> should cite original sources when they can be found and also mention
> where you obtained the data if not from the original source.  Think of
> the effort that people went through to collect the data and make it
> available to you, how would you feel if you put that much effort into
> something then someone else stole the credit or other rewards.  Many
> data sources have statements on how the data can be used and it is
> best to follow those instructions/requests, is it really that hard to
> add a reference to where the data came from and how you obtained it?
> In some educational cases it may be better to initially hide the
> source of the data, for example the outliers dataset in the
> TeachingDemos package would be a lot less useful for its intended
> purposes if students were to read its help page before analyzing it,
> therefore I have no problem with teachers using it without telling
> students where it came from (and since it is simulated I could
> possibly claim copyright), though I would appreciate a mention after
> the fact (once the lesson is learned the teacher could say "by the
> way, this data came from ...") and I expect that others would feel
> similarly (I should add a note to that effect to the documentation
> page).  But you should check the sources to see if this is
> specifically allowed or disallowed.
> I probably have not fully answered your question, but hopefully this
> gives a little more guidance.
> On Tue, Apr 22, 2014 at 11:29 AM, Soeren Groettrup
> <soeren.groettrup at gmail.com> wrote:
>> Hi everybody,
>> I've been searching the web for quite a time now and haven't found a
>> satisfying answer. I was wondering if the datasets provided within the R
>> packages are open, and thus can be used in publications? Concretely, can the
>> data, for example, be exported from R and uploaded in a different format
>> (like csv) to a website to be accessible for students to work with the data
>> in SPSS or Matlab? Is it enough to cite the source or paper or do I need a
>> permission for every dataset?
>> Thanks in advance for your replies,
>> Sören Gröttrup
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list