[Bioc-devel] PPA with built bioconductor packages (for continuous integration)

Mon Nov 10 18:49:19 CET 2014

On Mon, Nov 10, 2014 at 12:19 PM, Martin Morgan <mtmorgan at fredhutch.org>
wrote:

> On 11/09/2014 11:06 AM, Dan Tenenbaum wrote:
>
>>
>>
>> ----- Original Message -----
>>
>>> From: "Martin Morgan" <mtmorgan at fredhutch.org>
>>> To: "Laurent Gautier" <lgautier at gmail.com>, bioc-devel at r-project.org
>>> Sent: Sunday, November 9, 2014 8:26:48 AM
>>> Subject: Re: [Bioc-devel] PPA with built bioconductor packages (for
>>> continuous integration)
>>>
>>> On 11/09/2014 07:23 AM, Laurent Gautier wrote:
>>>
>>>> Hi,
>>>>
>>>> Continuous integration is a convenient way to automate some of the
>>>> steps
>>>> necessary to ensure quality software.
>>>>
>>>> Popular ways to do it create a vanilla virtual machine 9VM) with a
>>>> Linux
>>>> distribution, and scripts prepares the VM with 3rd-party
>>>> dependencies
>>>> required by the software. For example, the popular CI system Travis
>>>> for
>>>> github creates by default a VM running ubuntu, and dependencies can
>>>> be
>>>> installed with `apt-get install`.
>>>>
>>>> When developing software that requires CRAN/bioconductor, the
>>>> latest R is
>>>> available precompiled but the R packages must be downloaded
>>>> installed from
>>>> source.
>>>>
>>>> This can take a relatively long time. On a recent project over 80%
>>>> of the
>>>> time is spent downloading/installing the R/BioC packages. The
>>>> remaining is
>>>> building the code and running the unit tests.
>>>>
>>>> Having a Personal Package Archive (PPA) with bioconductor packages
>>>> already
>>>> compiled would both speed up the process and make the use of
>>>> continuous
>>>> integration by projects relying on bioconductor packages easier.
>>>>
>>>> Is this something others would like to have, and is this something
>>>> that
>>>> bioconductor would see to its mission to provide / help provide
>>>> quality
>>>> software and be able to host ?
>>>>
>>>
>>> It would be interesting to catalogue objectives (e.g., development
>>> vs.
>>> reproducibility) and available alternatives (e.g., PPA, docker /
>>> Rocker, AMI,
>>> existing or possible cloud services [such as the Bioc 'single package
>>> builder'
>>> used to build and check new package submissions, or travis itself],
>>> the Becker
>>> repository management scheme Michael and Gabe mention, ...);
>>>
>>
>>
>> Just to add to the mix of options, it's possible to run
>> R CMD INSTALL --build on a source tarball on Linux and it will create a
>> 'binary' version that is already compiled.
>>
>
> These binaries are in general not portable, either within or between
> distributions, e.g., because the user has a different version of a system
> dependency than the one the binary was built against.

This is my main concern with these distributed binaries.
Semi-institutional packaging of linux binaries
by other parties (e.g., debian) appears adequate.

It does seem to me that there is an important but perhaps inherently
elusive concept underlying this
discussion.  There is some scale of software or functionality distribution
bigger than a package but smaller than
a repository that we would like to identify and then take infrastructural
steps so that this entity is more conveniently
distributed/acquired/managed.  I find the bioc AMI reasonably tractable --
I know how to extend it and make
clusters of it.  So I consider it a nice unit of management/distribution
for a customized environment that I know can be the
basis for scalable solution of a specific problem.  However, I really only
know how to use it in the context of the EC2
ecosystem which incurs payments to amazon.  Theoretically (for me, for
others it is probably practically trivial) the
AMI (more importantly, its extensions) could be exported for use in a local
setting.  I gather that there are very lightweight
paths in this direction involving recipes.

How many of our users/developers would appreciate some tutelage in this
domain I cannot say.  Once you get
into this arena you have to assemble lots of relevant expertise, and be
able to deal with local idiosyncrasies,
and so it may not be worth our while to do much more than is currently
being done.

>
>
> Martin
>
>
>  The problem with this is (AFAIK) there is no corresponding package type
>> that can be used with install.packages();
>> otherwise the simplest solution would be to add a CRAN-style repos
>> containing these "binaries". Maybe R could be patched to allow this?
>> But it's possible that the requirements for Linux "binaries" could vary
>> depending on many things: cpu type (intel or solaris, or...),
>> architecture (i386, x64), presence/absence of BLAS/LAPACK, etc etc etc.
>> This suggests that a vm or container-based approach might be better.
>>
>> Dan
>>
>>
>>
>>
>>  if there
>>> is a clear
>>> path forward satisfying some plurality of users without too many
>>> technical
>>> obstacles then it might fall within the Bioc purview; my initial
>>> sense is that
>>> there is not a consensus on use cases or viable implementations, but
>>> I can be
>>> convinced otherwise...
>>>
>>> In terms of Tim's post, getting your colleague to use a PPA /
>>> existing
>>> alternative (e.g., the Bioc AMI,
>>> http://bioconductor.org/help/bioconductor-cloud-ami/ which comes with
>>> Rstudio
>>> server installed...) is not likely to be easier / faster than getting
>>> them to
>>> download / install relevant R / Bioc packages. One interesting
>>> possibility is a
>>> 'hosted' bioconductor with sufficient computational resources on the
>>> back-end
>>> and Rstudio server on the front end; this is not impossible to
>>> imaging seeking
>>> funding for.
>>>
>>>
>>
>>
>>  Martin
>>>
>>>
>>>> Best,
>>>>
>>>> Laurent
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>
>>> --
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]