[Bioc-devel] License question for experimental data package

Hervé Pagès hpages at fredhutch.org
Thu Mar 3 21:49:58 CET 2016

Hi Davide,

On 03/01/2016 02:25 PM, davide risso wrote:
> Dear Bioc developers,
> I recently downloaded three publicly available single-cell RNA-seq datasets
> from the NCBI GEO/SRA repository and created an R package with some
> gene-level summaries (read counts and FPKMs).
> I'm currently using the package locally for my own tests, but I'm thinking
> that this may be a useful resource for the community and thinking of
> sharing it on github and eventually submit it to Bioconductor.
> I was not involved in any way with the original studies, and I'm wondering
> what is the best practice in terms of license / data sharing. Since there
> are many experimental data packages in Bioconductor, I'm guessing that I'm
> not the first person wondering about this.
>>From the NCBI website, I read (quote from
> https://www.ncbi.nlm.nih.gov/home/about/policies.shtml):
> Databases of molecular data on the NCBI Web site include such examples as
> nucleotide sequences (GenBank), protein sequences, macromolecular
> structures, molecular variation, gene expression, and mapping data. They
> are designed to provide and encourage access within the scientific
> community to sources of current and comprehensive information. Therefore,
> NCBI itself places no restrictions on the use or distribution of the data
> contained therein. Nor do we accept data when the submitter has requested
> restrictions on reuse or redistribution. However, some submitters of the
> original data (or the country of origin of such data) may claim patent,
> copyright, or other intellectual property rights in all or a portion of the
> data (that has been submitted). NCBI is not in a position to assess the
> validity of such claims and since there is no transfer of rights from
> submitters to NCBI, NCBI has no rights to transfer to a third party.
> Therefore, NCBI cannot provide comment or unrestricted permission
> concerning the use, copying, or distribution of the information contained
> in the molecular databases.
> Should I contact the original authors for permission? Or is the fact that
> the data were publicly shared enough to grant me permission to redistribute?
> In that case, is there a standard license that I should use?
> Thanks for any feedback / thought!

I don't have much to offer. AFAIK we don't really have guidelines or
recommendations for what license to use for experimental data packages,
except for the usual "make sure you use an appropriate license" advice.
So far it has really been up to each author/maintainer to make sure
they pick up a license that is compatible with the original
license/copyright/patent of the original data they are packaging
and with its redistribution thru the Bioconductor channel.

FWIW here is a summary of the licenses used by the 276 experimental
data packages currently in BioC devel:

   License       Nb of packages
   ------------  --------------
   GPL                      135
   Artistic-2.0              96
   LGPL                      41
   other                      4

Would be interesting to hear from other developers about this. For
example, how people choose between GPL vs Artistic-2.0? Is one
license typically more appropriate for packaging and redistributing
data that is already publicly available?


> Best,
> davide
> 	[[alternative HTML version deleted]]
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the Bioc-devel mailing list