[Bioc-devel] Dependencies in Bioconductor dockers

Tue Sep 1 08:50:10 CEST 2015

On 08/31/2015 09:52 AM, Laurent Gatto wrote:
> On 29 August 2015 01:19, Martin Morgan wrote:
>
>> On 08/28/2015 02:51 PM, Dan Tenenbaum wrote:
>>>
>>> ----- Original Message -----
>>>> From: "Laurent Gatto" <lg390 at cam.ac.uk>
>>>> To: "Dan Tenenbaum" <dtenenba at fredhutch.org>
>>>> Cc: "Kasper Daniel Hansen" <kasperdanielhansen at gmail.com>, "bioC-devel" <bioc-devel at stat.math.ethz.ch>, "Laurent
>>>> Gatto" <lg390 at cam.ac.uk>
>>>> Sent: Friday, August 28, 2015 2:28:29 PM
>>>> Subject: Re: [Bioc-devel] Dependencies in Bioconductor dockers
>>>>
>>>>
>>>> On 28 August 2015 20:42, Dan Tenenbaum wrote:
>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Kasper Daniel Hansen" <kasperdanielhansen at gmail.com>
>>>>>> To: "Laurent Gatto" <lg390 at cam.ac.uk>
>>>>>> Cc: "bioC-devel" <bioc-devel at stat.math.ethz.ch>
>>>>>> Sent: Wednesday, August 26, 2015 2:36:08 PM
>>>>>> Subject: Re: [Bioc-devel] Dependencies in Bioconductor dockers
>>>>>>
>>>>>> This might be especially nice if we use the docker containers for
>>>>>> R
>>>>>> CMD
>>>>>> check.
>>>>>>
>>>>> In this case, you would be checking your own package, right, so the
>>>>> docker image cannot know in advance what the Suggests dependencies
>>>>> of
>>>>> your package are.
>>>>>
>>>>> [More below].
>>>>>
>>>>>
>>>>>> On Wed, Aug 26, 2015 at 10:56 PM, Laurent Gatto <lg390 at cam.ac.uk>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>> As far as I can see, the Suggests dependencies of a package are
>>>>>>> not
>>>>>>> included in the docker containers. Would you consider adding
>>>>>>> these?
>>>>>>> It
>>>>>>> would be nice to be able to run all examples and vignette code
>>>>>>> of
>>>>>>> the
>>>>>>> packages available in a container.
>>>>>
>>>>> Adding the Suggests dependencies of all packages installed on the
>>>>> image is going to make the image much bigger. This request comes
>>>>> soon
>>>>> after other requests to reduce the size of the images. We should
>>>>> probably have a wider discussion and decide exactly what type of
>>>>> docker images we want to have.
>>>>>
>>>>> Use cases that have been mentioned are:
>>>>>
>>>>>     - an image for building/checking with travis (sounds similar to
>>>>>       Kasper's request above).  For this one in particular, small
>>>>>       size is
>>>>>       important as Travis has to build its environment from scratch
>>>>>       every
>>>>>       time, and loading large images takes too long.
>>>>>     - an image that has the Suggests dependencies of all installed
>>>>>       packages installed.
>>>>>
>>>>> We might want to pick a different way to decide what packages are
>>>>> installed on a given image.  Currently we install all packages with
>>>>> a
>>>>> given biocView (Sequencing for example) and this leads to very
>>>>> large
>>>>> images (sequencing = ~7.5GB).
>>>> Thank you for these clarifications, Dan.
>>>>
>>>> If there is interest in having full/complete containers in addition
>>>> to
>>>> requiring light ones, would it make sense to distribute both? Would
>>>> that
>>>> be much overhead?
>>>>
>>> I think it definitely makes sense to distribute the light containers. (and even then, I want to see how small a 'light' container is--one that contains R, LaTeX, and every system dependency that we know about)
>>> I am a little hesitant to make the existing bloated containers even bigger by adding all the Suggests dependencies. That's why I said we might want to revisit the way we decide what packages are on a given container. Right now we use biocViews (Microarray, Sequencing, Proteomics, FlowCytometry) but that results in huge containers containing many packages that people arguably don't use that much but just happen to have the correct biocView. Of course it does have the benefit of being a somewhat democratic method.
>>>
>> I don't really know what I'm talking about, but does it make sense to think of
>> the docker images provided by Bioconductor as building blocks for more
>> specialized containers? i.e., that it should not be 'hard' for a developer to
>> make an image that is appropriate for their particular needs?
>>
>> It seems like there's value to some level of nimbleness provided by small
>> container size. I also wonder about LaTeX -- it seems like HTML vignettes are
>> way better, and since docker images are forward-looking, maybe the images should
>> be provisioned with the notion that they'll support HTML?
>>
>> Maybe there could be a docker-factory script that would take the name of a base
>> image and the path to a package repository, and create a derived image with the
>> additional necessary dependencies?
> That sounds like a great idea. It would still be nice if Bioconductor
> kept the topic specific containers (flow, microarrays, proteomics,
> sequencing).
>
> Laurent
>
>

I can pitch in a viewpoint here... I'm doing basically exactly this. 
I've created several of my own Dockerfiles, which essentially use the 
base bioconductor images, and then build on these various combinations 
of packages that I need; one for production, one for development, etc.

I even wrote a few "R setup" scripts that just take a list of packages, 
and then install these into a new container on top of the bioconductor 
base images. Seems like almost exactly what you're describing, actually.

I don't think it's really reasonable to expect bioconductor to create 
docker images like this, for every possible use case; but providing a 
base image is very useful, and then people (like me) can use this to 
build our own containers, with whatever packages we require. We could 
even write a tutorial on how to do this...

I don't think it's particularly useful to make huge, even democratic 
containers with all packages of a certain type, honestly.

It's a work in progress, but my repo with a couple of Dockerfiles and 
setup scripts that does this is here, if anyone is interested: 
https://github.com/sheffien/docker

-Nathan