[R] R dataset copyrights
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Apr 25 08:23:38 CEST 2014
On 24/04/2014 22:33, Greg Snow wrote:
> Many, probably even most (but I have not checked) of the datasets
> available in R packages have help files with a references section.
> That section should lead you to an original source that may have the
> copyright information and is what should be referenced.
> My understanding (but I am not a lawyer, do not play one on TV, or
> claim to be any type of legal expert) is that you cannot copyright
> facts, but you can copyright the layout and presentation of facts. So
> real data about the real world cannot be copyrighted, but the layout
> and presentation can be. So if you photocopy a page from a journal
> and post that you may be in trouble for copying and distributing the
> layout and presentation of the data, but not the data itself. But if
> you transform the numbers to a file to be read by the computer then
> you have just copied the facts which are not copyrighted.
You most likely also copied the layout (which numbers/strings are in
which rows ...). There are legal precedents involving telephone
directories, for example.
There was a May 2007 thread about this: see
https://stat.ethz.ch/pipermail/r-help/2007-May/131780.html and replies.
> On the other hand simulated or otherwise made up datasets could be
> considered to be fiction and therefore able to be copyrighted. I
> remember hearing (but I don't remember where or when) that some
> textbook authors are encouraged to use simulated data instead of real
> data (it can have the same mean, sd, etc. as a real dataset so the
> interpretation is the same) in textbooks so that the copyright of the
> textbook also applies to the data. It is not always clear whether a
> dataset is fact or simulated, so it is best to obtain permission or
> check official statements from the source.
> Beyond what is legal you should consider what is right. Even if you
> don't have to cite a data source, you should try to give credit where
> it is due (and possibly blame if there is an error). At a minimum you
> should cite original sources when they can be found and also mention
> where you obtained the data if not from the original source. Think of
> the effort that people went through to collect the data and make it
> available to you, how would you feel if you put that much effort into
> something then someone else stole the credit or other rewards. Many
> data sources have statements on how the data can be used and it is
> best to follow those instructions/requests, is it really that hard to
> add a reference to where the data came from and how you obtained it?
> In some educational cases it may be better to initially hide the
> source of the data, for example the outliers dataset in the
> TeachingDemos package would be a lot less useful for its intended
> purposes if students were to read its help page before analyzing it,
> therefore I have no problem with teachers using it without telling
> students where it came from (and since it is simulated I could
> possibly claim copyright), though I would appreciate a mention after
> the fact (once the lesson is learned the teacher could say "by the
> way, this data came from ...") and I expect that others would feel
> similarly (I should add a note to that effect to the documentation
> page). But you should check the sources to see if this is
> specifically allowed or disallowed.
> I probably have not fully answered your question, but hopefully this
> gives a little more guidance.
> On Tue, Apr 22, 2014 at 11:29 AM, Soeren Groettrup
> <soeren.groettrup at gmail.com> wrote:
>> Hi everybody,
>> I've been searching the web for quite a time now and haven't found a
>> satisfying answer. I was wondering if the datasets provided within the R
>> packages are open, and thus can be used in publications? Concretely, can the
>> data, for example, be exported from R and uploaded in a different format
>> (like csv) to a website to be accessible for students to work with the data
>> in SPSS or Matlab? Is it enough to cite the source or paper or do I need a
>> permission for every dataset?
>> Thanks in advance for your replies,
>> Sören Gröttrup
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help