[Bioc-devel] strange error in Jenkins build forsingleCellWorkflow

Wolfgang Huber wolfgang.huber at embl.de
Tue Sep 19 22:17:54 CEST 2017


19.9.17 18:16, Martin Morgan scripsit:
> On 09/19/2017 09:50 AM, Wolfgang Huber wrote:
>> My 3 cents:
>> - I think this is a more and more common problem that I'm also 
>> encountering in everyday work and that asks for a general solution.
>> - I agree with Martin that setting R_MAX_NUM_DLLS is better than 
>> unloading. AfaIk it is not even possible to cleanly unload every 
>> package ('as if it had never been loaded') due to irreversible global 
>> effects; although I'd happy to be educated otherwise.
>> - R_MAX_NUM_DLLS is not a sustainable solution either: the current 
>> default is 100, but e.g. on my MacOS 10.12 any value >152 leads to an 
>> error. Upping to the maximum 152 will give us some temporary respite 
>> but seems not really future-proof.
> 
> This was the R-core motivation for increasing the max to only 100, but 
> it's still surprising to me that a modern OS has such a tight limit. 
> I'll see if there are ideas in R-core.
> 
>  From our internal discussions there is some willingness to (continue) 
> supporting large and complicated work flows, 

Doesn't need to be particularly large and complicated. Just the 
following in a fresh R session leads to 61 DLLs being loaded:

   library("tidyverse")
   library("DESeq2")
   .dynLibs()

and when I add a

   library("caret")

we're up to 72, and plus "scater", at 78.

	Best-
		Wolfgang

but it is valuable to think
> carefully about the consequences for users following along. Maybe part 
> of this is clearly alerting the user to the fact that 500G of data are 
> going to be downloaded, the workflow requires advanced configuration of 
> R, etc.
> 
> @Aaron -- if you'd like to continue with one work flow, contact Herve 
> (cc'd) and he'll provide the .BBSoptions configuration to allow the 
> build system to use an appropriate R_MAX_NUM_DLLS. If instead you'd like 
> to produce two workflows, then the best strategy in your case would be 
> to simply have two independent packages (DESCRIPTION + vignettes/) each 
> with more modest numbers of DLLs; contact Lori (cc'd) when you've 
> decided on a second name, and we'll create the svn location for you.
> 
> Martin
> 
>>
>>      Wolfgang
>>
>> 19.9.17 12:02, Martin Morgan scripsit:
>>> On 09/18/2017 10:42 PM, Shian Su wrote:
>>>> Hi Aaron,
>>>>
>>>> Would you mind sharing the code for flushing DLLs? This is a problem 
>>>> that others working with single cells and I have faced.
>>>>
>>>
>>> For the user encountering this problem I think a better solution is 
>>> to increase the number of DLLs allowed by R, for instance editing 
>>> .Renviron to contain the line
>>>
>>> R_MAX_NUM_DLLS=120
>>>
>>> or similar. This can be on an installation-wide, user-wise, or 
>>> project-specific basis, as described in ?Startup
>>>
>>> @Aaron -- we are still discussing things internally; for instance it 
>>> is possible to set the maximum number of DLLs in the build system.
>>>
>>> Martin
>>>
>>>> Better yet would anyone know of code that would allow unused DLL to 
>>>> be identified and unloaded? I suspect not as it would require 
>>>> keeping track of the dependency tree of your current environment but 
>>>> I’m hopeful.
>>>>
>>>> Kind regards,
>>>> Shian Su
>>>>
>>>>> On 19 Sep 2017, at 12:30 pm, Aaron Lun <alun at wehi.edu.au> wrote:
>>>>>
>>>>> Well, inertia won out in the end, and so I've just moved a whole 
>>>>> stack of packages into "Suggests" for now. This is probably not a 
>>>>> sustainable solution as the workflow can potentially get larger 
>>>>> over time; I would prefer to have some formal support for splitting 
>>>>> up the workflow into modules that can be independently installed.
>>>>>
>>>>> -Aaron
>>>>> ________________________________
>>>>> From: Vincent Carey <stvjc at channing.harvard.edu>
>>>>> Sent: Saturday, 16 September 2017 10:08:13 PM
>>>>> To: Aaron Lun
>>>>> Cc: Martin Morgan; bioc-devel at r-project.org
>>>>> Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in 
>>>>> Jenkins build forsingleCellWorkflow
>>>>>
>>>>> IMHO the pedagogic value of a unified document that treats a topic 
>>>>> thoroughly
>>>>> is quite high.  Building the whole workflow on an arbitrary user's 
>>>>> system seems to
>>>>> me to be a lower priority.  Thus using the environment variable in 
>>>>> the build system
>>>>> to avoid this limit seems an appropriate solution.
>>>>>
>>>>> On Sat, Sep 16, 2017 at 7:43 AM, Aaron Lun 
>>>>> <alun at wehi.edu.au<mailto:alun at wehi.edu.au>> wrote:
>>>>> Thanks Martin. Yes, it's quite unfortunate that scater drags in 
>>>>> dplyr and ggplot2, which - combined with Bioconductor's core 
>>>>> packages - already puts us pretty close to the limit without doing 
>>>>> anything else!
>>>>>
>>>>>
>>>>> A solution might be to split my workflow into self-contained 
>>>>> components, each of which can become its own workflow package 
>>>>> (e.g., simpleSingleCell1, simpleSingleCell2, simpleSingleCell3 and 
>>>>> so on). This should avoid all of the problems and our associated 
>>>>> hacks.
>>>>>
>>>>>
>>>>> I'm happy to do this, but is it possible for the website to 
>>>>> indicate that there is a connection between the component 
>>>>> workflows? For example, the link that ordinarily goes to the 
>>>>> compiled workflow could instead go to an indexing page, which 
>>>>> contains links to individual component workflows.
>>>>>
>>>>>
>>>>> -Aaron
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Martin Morgan 
>>>>> <martin.morgan at roswellpark.org<mailto:martin.morgan at roswellpark.org>>
>>>>> Sent: Saturday, 16 September 2017 8:18:09 PM
>>>>> To: Aaron Lun; 
>>>>> bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>
>>>>> Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in 
>>>>> Jenkins build forsingleCellWorkflow
>>>>>
>>>>> On 09/16/2017 01:53 AM, Aaron Lun wrote:
>>>>>> Bumping this rather old thread. To re-iterate, I'm updating my 
>>>>>> simpleSingleCell workflow and I'm running into R's DLL limit. I've 
>>>>>> added a code block halfway through the workflow that unloads all 
>>>>>> DLLs and cleans them out, and this works fine during compilation 
>>>>>> on my local machine.
>>>>>>
>>>>>>
>>>>>> However, it seems that the BioC workflow builder uses a 
>>>>>> pre-processing step whereby it first tries to load all packages 
>>>>>> contained within library() calls. This hits the DLL limit as it 
>>>>>> doesn't execute the protective code block, which defeats the 
>>>>>> purpose of all my fiddling in the first place.
>>>>>>
>>>>>>
>>>>>> What options are there? I'm happy to split my workflow into 
>>>>>> multiple smaller Rmarkdown files that get compiled separately, 
>>>>>> provided there is appropriate support for this setup from the 
>>>>>> build system
>>>>>
>>>>> The workflows have been standardized as packages. The packages put the
>>>>> workflow dependencies in the 'Depends:' field, with the idea being 
>>>>> that
>>>>> the user installing the workflow package 'in the usual way' will 
>>>>> get the
>>>>> packages used in the vignette installed in their system 'in the usual
>>>>> way' without having to execute special variants of biocLite() /
>>>>> install.packages() / funky code in the vignette itself to be able to
>>>>> build the vignette.
>>>>>
>>>>> Loading a package loads its Depends: (and Imports:) so triggers the 
>>>>> problem.
>>>>>
>>>>> Writing separate vignettes would not help with this (but might make 
>>>>> the
>>>>> workflow more palatable; I'm not 100% sure of support for separate 
>>>>> work
>>>>> flows in a single package, there is no problem with having multiple
>>>>> workflow packages on the same general topic).
>>>>>
>>>>> One could move (some?) packages to Suggests: and use your trick of
>>>>> unloading packages part-way through the vignette. But then users will
>>>>> find that they need to install packages to complete the vignette.
>>>>>
>>>>> 'We' could add a support for a BBS option that increases 
>>>>> R_MAX_NUM_DLLS,
>>>>> but that would allow the workflow to build on the build system, but 
>>>>> not
>>>>> on the users' system.
>>>>>
>>>>> I think also the R-core approach to this
>>>>> (https://stat.ethz.ch/pipermail/r-devel/2016-December/073529.html,
>>>>> https://github.com/wch/r-source/commit/757bfa1d7ff373a604d6d34617f9cad78e0c875e) 
>>>>>
>>>>> is a little insightful, where one could imagine increasing the default
>>>>> R_MAX_NUM_DLLS, but apparently on some OS these compete for number of
>>>>> open files, and this in turn can be quite low.
>>>>>
>>>>> I note that users have already struggled with the DLL problem 'in the
>>>>> wild' https://stackoverflow.com/a/45552926/547331. This seems
>>>>> particularly problematic for workflows, which are appealing to
>>>>> relatively novice users.
>>>>>
>>>>> At the end of the day I think the workflows should make realistic 
>>>>> use of
>>>>> R resources. I think this means modifying the workflow to use fewer
>>>>> DLLs. (this general comment is relevant to other workflows, which for
>>>>> instance start by downloading very large data sets -- I know that less
>>>>> constrained use of computing resources is supposed to be a selling 
>>>>> point
>>>>> of the workflows, but in excess this seems counter-productive to their
>>>>> primary use as pedagogic tools [rather than, for instance, 
>>>>> comprehensive
>>>>> exemplars of reproducible research]).
>>>>>
>>>>> Maybe there is additional discussion about some of the technical 
>>>>> aspects
>>>>> of workflows that others might contribute.
>>>>>
>>>>> Martin
>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>> Aaron
>>>>>>
>>>>>> ________________________________
>>>>>> From: Bioc-devel 
>>>>>> <bioc-devel-bounces at r-project.org<mailto:bioc-devel-bounces at r-project.org>> 
>>>>>> on behalf of Aaron Lun <alun at wehi.edu.au<mailto:alun at wehi.edu.au>>
>>>>>> Sent: Wednesday, 21 June 2017 12:09:13 AM
>>>>>> To: bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>
>>>>>> Subject: [Untrusted Server]Re: [Bioc-devel] strange error in 
>>>>>> Jenkins build forsingleCellWorkflow
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>>
>>>>>> I'm getting a curious error in the Jenkins log when I try to build 
>>>>>> the singleCellWorkflow:
>>>>>>
>>>>>>
>>>>>> http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/48/label=master/console 
>>>>>>
>>>>>>
>>>>>>
>>>>>> The key part is at the bottom:
>>>>>>
>>>>>>
>>>>>> Error: package or namespace load failed for 'GenomicFeatures' in 
>>>>>> dyn.load(file, DLLpath = DLLpath, ...):
>>>>>>   unable to load shared object 
>>>>>> '/var/lib/jenkins/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/libs/Rsamtools.so': 
>>>>>>
>>>>>>    `maximal number of DLLs reached...
>>>>>>
>>>>>>
>>>>>> The workflow had previously been running fine on the build system; 
>>>>>> I'm not quite sure what's going on here, given that it's not even 
>>>>>> failing at the point where I made the latest changes.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Aaron
>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing 
>>>>>> list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>        [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing 
>>>>>> list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>
>>>>>
>>>>> This email message may contain legally privileged and/or 
>>>>> confidential information.  If you are not the intended 
>>>>> recipient(s), or the employee or agent responsible for the delivery 
>>>>> of this message to the intended recipient(s), you are hereby 
>>>>> notified that any disclosure, copying, distribution, or use of this 
>>>>> email message is prohibited. If you have received this message in 
>>>>> error, please notify the sender immediately by e-mail and delete 
>>>>> this email message from your computer. Thank you.
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>>     [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>> This email message may contain legally privileged and/or...{{dropped:2}}
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> 
> 
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the employee 
> or agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited.  If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.

-- 
With thanks in advance-
Wolfgang

-------
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.huber at embl.de
http://www.huber.embl.de



More information about the Bioc-devel mailing list