[Bioc-devel] License question for experimental data package

Tim Triche, Jr. tim.triche at gmail.com
Fri Mar 4 16:03:57 CET 2016


Data (facts) are not copyright worthy, but databases (collections of facts) can be.  See Feist v Rural for precedent; in short, there must be an inobvious and creative aspect to the database for it to be elevated to copyrightable status.  I doubt that a collection of datasets would clear this bar, but it's still worth noting. 

--t

> On Mar 4, 2016, at 6:22 AM, Robert M. Flight <rflight79 at gmail.com> wrote:
> 
> I am pretty sure in general "data" is not copyrightable per se (
> http://www.lib.umich.edu/copyright/facts-and-data), so while I might
> contact the original authors as a courtesy, if the data has been released
> into any public database, then you should be free to do with it as you
> please. Providing the original accession numbers for the data and relevant
> citations (if they exist) so that it is easy for you and others to be given
> credit if the data is used would be a good thing to do.
> 
> Also, I would personally go with the CC0 (waive of copyright, see
> https://wiki.creativecommons.org/wiki/CC0) for a data package, as the data
> is already publicly available, you have just packaged it together into a
> useful set.
> 
> My 2 cents.
> 
> -Robert
> 
> Robert M Flight, PhD
> Bioinformatics Research Associate
> Resource Center for Stable Isotope Resolved Metabolomics
> Manager, Systems Biology and Omics Integration Journal Club
> Markey Cancer Center
> CC434 Roach Building
> University of Kentucky
> Lexington, KY
> 
> Twitter: @rmflight
> Web: rmflight.github.io
> ORCID: http://orcid.org/0000-0001-8141-7788
> EM rflight79 at gmail.com
> PH 502-509-1827
> 
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. - Ronald Fisher
> 
> 
> 
> On Fri, Mar 4, 2016 at 8:52 AM Kasper Daniel Hansen <
> kasperdanielhansen at gmail.com> wrote:
> 
>> For data packages, which does not contain any code, it seems weird to use a
>> software license such as GPL or GPL-2.  It seems better to use something
>> like Artistic-2.0 or one of the CC licenses.
>> 
>> On Thu, Mar 3, 2016 at 5:15 PM, davide risso <risso.davide at gmail.com>
>> wrote:
>> 
>>> Hi Hervé and Sean,
>>> 
>>> thanks for your help. It will indeed be interesting to hear how other
>>> people chose the license, especially for those package that redistribute
>> a
>>> dataset not from their lab.
>>> 
>>> I do have an experimental data package in Bioc, zebrafishRNASeq, but it's
>>> an experiment from a collaborator and at the time I didn't pay much
>>> attention on which license to use.
>>> In this case, I'd like to redistribute data from different labs. I guess
>> I
>>> will contact the original authors at least as a courtesy.
>>> But I'm still keen to hear opinions on which license(s) is appropriate
>> for
>>> experimental data sharing.
>>> 
>>> Best,
>>> davide
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Mar 3, 2016 at 12:50 PM Hervé Pagès <hpages at fredhutch.org>
>> wrote:
>>> 
>>>> Hi Davide,
>>>> 
>>>>> On 03/01/2016 02:25 PM, davide risso wrote:
>>>>> Dear Bioc developers,
>>>>> 
>>>>> I recently downloaded three publicly available single-cell RNA-seq
>>>> datasets
>>>>> from the NCBI GEO/SRA repository and created an R package with some
>>>>> gene-level summaries (read counts and FPKMs).
>>>>> 
>>>>> I'm currently using the package locally for my own tests, but I'm
>>>> thinking
>>>>> that this may be a useful resource for the community and thinking of
>>>>> sharing it on github and eventually submit it to Bioconductor.
>>>>> 
>>>>> I was not involved in any way with the original studies, and I'm
>>>> wondering
>>>>> what is the best practice in terms of license / data sharing. Since
>>> there
>>>>> are many experimental data packages in Bioconductor, I'm guessing
>> that
>>>> I'm
>>>>> not the first person wondering about this.
>>>>> 
>>>>>> From the NCBI website, I read (quote from
>>>>> https://www.ncbi.nlm.nih.gov/home/about/policies.shtml):
>>>>> Databases of molecular data on the NCBI Web site include such
>> examples
>>> as
>>>>> nucleotide sequences (GenBank), protein sequences, macromolecular
>>>>> structures, molecular variation, gene expression, and mapping data.
>>> They
>>>>> are designed to provide and encourage access within the scientific
>>>>> community to sources of current and comprehensive information.
>>> Therefore,
>>>>> NCBI itself places no restrictions on the use or distribution of the
>>> data
>>>>> contained therein. Nor do we accept data when the submitter has
>>> requested
>>>>> restrictions on reuse or redistribution. However, some submitters of
>>> the
>>>>> original data (or the country of origin of such data) may claim
>> patent,
>>>>> copyright, or other intellectual property rights in all or a portion
>> of
>>>> the
>>>>> data (that has been submitted). NCBI is not in a position to assess
>> the
>>>>> validity of such claims and since there is no transfer of rights from
>>>>> submitters to NCBI, NCBI has no rights to transfer to a third party.
>>>>> Therefore, NCBI cannot provide comment or unrestricted permission
>>>>> concerning the use, copying, or distribution of the information
>>> contained
>>>>> in the molecular databases.
>>>>> 
>>>>> Should I contact the original authors for permission? Or is the fact
>>> that
>>>>> the data were publicly shared enough to grant me permission to
>>>> redistribute?
>>>>> In that case, is there a standard license that I should use?
>>>>> 
>>>>> Thanks for any feedback / thought!
>>>> 
>>>> I don't have much to offer. AFAIK we don't really have guidelines or
>>>> recommendations for what license to use for experimental data packages,
>>>> except for the usual "make sure you use an appropriate license" advice.
>>>> So far it has really been up to each author/maintainer to make sure
>>>> they pick up a license that is compatible with the original
>>>> license/copyright/patent of the original data they are packaging
>>>> and with its redistribution thru the Bioconductor channel.
>>>> 
>>>> FWIW here is a summary of the licenses used by the 276 experimental
>>>> data packages currently in BioC devel:
>>>> 
>>>>   License       Nb of packages
>>>>   ------------  --------------
>>>>   GPL                      135
>>>>   Artistic-2.0              96
>>>>   LGPL                      41
>>>>   other                      4
>>>> 
>>>> Would be interesting to hear from other developers about this. For
>>>> example, how people choose between GPL vs Artistic-2.0? Is one
>>>> license typically more appropriate for packaging and redistributing
>>>> data that is already publicly available?
>>>> 
>>>> H.
>>>> 
>>>>> 
>>>>> Best,
>>>>> davide
>>>>> 
>>>>>      [[alternative HTML version deleted]]
>>>>> 
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>> 
>>>> 
>>>> --
>>>> Hervé Pagès
>>>> 
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>> 
>>>> E-mail: hpages at fredhutch.org
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>>> 
>>> 
>>>        [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
>    [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list