[R] R dataset copyrights
538280 at gmail.com
Thu Apr 24 23:33:44 CEST 2014
Many, probably even most (but I have not checked) of the datasets
available in R packages have help files with a references section.
That section should lead you to an original source that may have the
copyright information and is what should be referenced.
My understanding (but I am not a lawyer, do not play one on TV, or
claim to be any type of legal expert) is that you cannot copyright
facts, but you can copyright the layout and presentation of facts. So
real data about the real world cannot be copyrighted, but the layout
and presentation can be. So if you photocopy a page from a journal
and post that you may be in trouble for copying and distributing the
layout and presentation of the data, but not the data itself. But if
you transform the numbers to a file to be read by the computer then
you have just copied the facts which are not copyrighted.
On the other hand simulated or otherwise made up datasets could be
considered to be fiction and therefore able to be copyrighted. I
remember hearing (but I don't remember where or when) that some
textbook authors are encouraged to use simulated data instead of real
data (it can have the same mean, sd, etc. as a real dataset so the
interpretation is the same) in textbooks so that the copyright of the
textbook also applies to the data. It is not always clear whether a
dataset is fact or simulated, so it is best to obtain permission or
check official statements from the source.
Beyond what is legal you should consider what is right. Even if you
don't have to cite a data source, you should try to give credit where
it is due (and possibly blame if there is an error). At a minimum you
should cite original sources when they can be found and also mention
where you obtained the data if not from the original source. Think of
the effort that people went through to collect the data and make it
available to you, how would you feel if you put that much effort into
something then someone else stole the credit or other rewards. Many
data sources have statements on how the data can be used and it is
best to follow those instructions/requests, is it really that hard to
add a reference to where the data came from and how you obtained it?
In some educational cases it may be better to initially hide the
source of the data, for example the outliers dataset in the
TeachingDemos package would be a lot less useful for its intended
purposes if students were to read its help page before analyzing it,
therefore I have no problem with teachers using it without telling
students where it came from (and since it is simulated I could
possibly claim copyright), though I would appreciate a mention after
the fact (once the lesson is learned the teacher could say "by the
way, this data came from ...") and I expect that others would feel
similarly (I should add a note to that effect to the documentation
page). But you should check the sources to see if this is
specifically allowed or disallowed.
I probably have not fully answered your question, but hopefully this
gives a little more guidance.
On Tue, Apr 22, 2014 at 11:29 AM, Soeren Groettrup
<soeren.groettrup at gmail.com> wrote:
> Hi everybody,
> I've been searching the web for quite a time now and haven't found a
> satisfying answer. I was wondering if the datasets provided within the R
> packages are open, and thus can be used in publications? Concretely, can the
> data, for example, be exported from R and uploaded in a different format
> (like csv) to a website to be accessible for students to work with the data
> in SPSS or Matlab? Is it enough to cite the source or paper or do I need a
> permission for every dataset?
> Thanks in advance for your replies,
> Sören Gröttrup
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com
More information about the R-help