[Bioc-devel] r+w permissions in release branches

Martin Morgan mtmorgan at fhcrc.org
Wed Apr 23 19:41:42 CEST 2014


On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote:
> I think we should have a CRAN snapshot (or a subset of CRAN used in Bioc)
> inside each Bioc release; I don't know how hard that is to manage from a
> technical point of view.

I followed this thread with some interest.

It would be surprisingly challenging to update even a 2.13 package -- the build 
machines have moved on to other tasks, unconstrained by the unique system 
dependencies needed for 2.13 builds.

The idea of a 'forever' repository snapshot seems possible, but would the 
snapshot be at the beginning of the release and hence miss the few but important 
bug fixes introduced during the release, or at the end of the release, which 
might be after the time required for the purposes of replication? Either way it 
is certain that the peanut butter would land face down for one's particular 
need. Also, the need for the user to satisfy system dependencies becomes 
increasingly challenging, even with a binary repository. I don't think a central 
'Bioc' solution would really address the problem of reproducibility.

It is not that 'hard' for an individual group to create a snapshot of Bioc and 
CRAN, using rsync

   http://www.bioconductor.org/about/mirrors/mirror-how-to/
   http://cran.r-project.org/mirror-howto.html‎

and to use install.packages() or even biocLite to access these (see 
?setRepositories). This would again require that the system dependencies for 
these packages are satisfied in some kind of frozen fashion.

A more robust possibility is of course a virtual machine, such as the AMI (or a 
customized version) we provide

   http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids

although these have only a subset of packages installed by default.

The CRAN thread referenced earlier included this post

   https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html

which I think makes an important distinction between exact replication and 
scientific reproducibility; it is the latter that must be the most interesting, 
and the former that we somehow seem to stumble over. The thread also mentions 
best practices -- version control

   http://bioconductor.org/developers/how-to/source-control/

disciplined approach to deprecation

   http://bioconductor.org/developers/how-to/deprecation/

package versioning

   http://bioconductor.org/developers/how-to/version-numbering/

and the Bioc-style approach to release that we as developers can act on to 
enhance reproducibility. What other best practices can we more forcefully / 
conveniently adopt within the project?

Martin

>
> Best,
> Kasper
>
>
> On Tue, Apr 22, 2014 at 6:06 PM, Julian Gehring <julian.gehring at embl.de>wrote:
>
>> Hi,
>>
>> For most problems discussed here, it seems that having a fixed version of
>> package is sufficient rather than a specific version.  If the idea of a
>> snapshot with each bioc release would work (which still means one version
>> per package), so would requiring that version within the package (one would
>> just need to agree which version this is).
>>
>> Best wishes
>>
>> Julian
>>
>>
>>   what if two Bioc packages require different version of the ‘same’ CRAN
>>> package?
>>> AfaIu, the infrastructure is not designed to deal with multiple versions
>>> of a package.
>>>
>>> Nor would I as a user expect to have less-than-the-most recent versions
>>> of CRAN packages in my library just because some other package says so…
>>>
>>> Just to throw in another, and probably silly suggestion: the Bioconductor
>>> repository could keep ‘snapshots’ of CRAN packages compatible with each
>>> release, but they would have to be name-mangled in some way. The potential
>>> for confusion is enormous.
>>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list