[Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re: strange error in Jenkins build forsingleCellWorkflow

Wolfgang Huber wolfgang.huber at embl.de
Tue Sep 19 15:50:10 CEST 2017


My 3 cents:
- I think this is a more and more common problem that I'm also 
encountering in everyday work and that asks for a general solution.
- I agree with Martin that setting R_MAX_NUM_DLLS is better than 
unloading. AfaIk it is not even possible to cleanly unload every package 
('as if it had never been loaded') due to irreversible global effects; 
although I'd happy to be educated otherwise.
- R_MAX_NUM_DLLS is not a sustainable solution either: the current 
default is 100, but e.g. on my MacOS 10.12 any value >152 leads to an 
error. Upping to the maximum 152 will give us some temporary respite but 
seems not really future-proof.

	Wolfgang

19.9.17 12:02, Martin Morgan scripsit:
> On 09/18/2017 10:42 PM, Shian Su wrote:
>> Hi Aaron,
>>
>> Would you mind sharing the code for flushing DLLs? This is a problem 
>> that others working with single cells and I have faced.
>>
> 
> For the user encountering this problem I think a better solution is to 
> increase the number of DLLs allowed by R, for instance editing .Renviron 
> to contain the line
> 
> R_MAX_NUM_DLLS=120
> 
> or similar. This can be on an installation-wide, user-wise, or 
> project-specific basis, as described in ?Startup
> 
> @Aaron -- we are still discussing things internally; for instance it is 
> possible to set the maximum number of DLLs in the build system.
> 
> Martin
> 
>> Better yet would anyone know of code that would allow unused DLL to be 
>> identified and unloaded? I suspect not as it would require keeping 
>> track of the dependency tree of your current environment but I’m hopeful.
>>
>> Kind regards,
>> Shian Su
>>
>>> On 19 Sep 2017, at 12:30 pm, Aaron Lun <alun at wehi.edu.au> wrote:
>>>
>>> Well, inertia won out in the end, and so I've just moved a whole 
>>> stack of packages into "Suggests" for now. This is probably not a 
>>> sustainable solution as the workflow can potentially get larger over 
>>> time; I would prefer to have some formal support for splitting up the 
>>> workflow into modules that can be independently installed.
>>>
>>> -Aaron
>>> ________________________________
>>> From: Vincent Carey <stvjc at channing.harvard.edu>
>>> Sent: Saturday, 16 September 2017 10:08:13 PM
>>> To: Aaron Lun
>>> Cc: Martin Morgan; bioc-devel at r-project.org
>>> Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in 
>>> Jenkins build forsingleCellWorkflow
>>>
>>> IMHO the pedagogic value of a unified document that treats a topic 
>>> thoroughly
>>> is quite high.  Building the whole workflow on an arbitrary user's 
>>> system seems to
>>> me to be a lower priority.  Thus using the environment variable in 
>>> the build system
>>> to avoid this limit seems an appropriate solution.
>>>
>>> On Sat, Sep 16, 2017 at 7:43 AM, Aaron Lun 
>>> <alun at wehi.edu.au<mailto:alun at wehi.edu.au>> wrote:
>>> Thanks Martin. Yes, it's quite unfortunate that scater drags in dplyr 
>>> and ggplot2, which - combined with Bioconductor's core packages - 
>>> already puts us pretty close to the limit without doing anything else!
>>>
>>>
>>> A solution might be to split my workflow into self-contained 
>>> components, each of which can become its own workflow package (e.g., 
>>> simpleSingleCell1, simpleSingleCell2, simpleSingleCell3 and so on). 
>>> This should avoid all of the problems and our associated hacks.
>>>
>>>
>>> I'm happy to do this, but is it possible for the website to indicate 
>>> that there is a connection between the component workflows? For 
>>> example, the link that ordinarily goes to the compiled workflow could 
>>> instead go to an indexing page, which contains links to individual 
>>> component workflows.
>>>
>>>
>>> -Aaron
>>>
>>>
>>> ________________________________
>>> From: Martin Morgan 
>>> <martin.morgan at roswellpark.org<mailto:martin.morgan at roswellpark.org>>
>>> Sent: Saturday, 16 September 2017 8:18:09 PM
>>> To: Aaron Lun; bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>
>>> Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in 
>>> Jenkins build forsingleCellWorkflow
>>>
>>> On 09/16/2017 01:53 AM, Aaron Lun wrote:
>>>> Bumping this rather old thread. To re-iterate, I'm updating my 
>>>> simpleSingleCell workflow and I'm running into R's DLL limit. I've 
>>>> added a code block halfway through the workflow that unloads all 
>>>> DLLs and cleans them out, and this works fine during compilation on 
>>>> my local machine.
>>>>
>>>>
>>>> However, it seems that the BioC workflow builder uses a 
>>>> pre-processing step whereby it first tries to load all packages 
>>>> contained within library() calls. This hits the DLL limit as it 
>>>> doesn't execute the protective code block, which defeats the purpose 
>>>> of all my fiddling in the first place.
>>>>
>>>>
>>>> What options are there? I'm happy to split my workflow into multiple 
>>>> smaller Rmarkdown files that get compiled separately, provided there 
>>>> is appropriate support for this setup from the build system
>>>
>>> The workflows have been standardized as packages. The packages put the
>>> workflow dependencies in the 'Depends:' field, with the idea being that
>>> the user installing the workflow package 'in the usual way' will get the
>>> packages used in the vignette installed in their system 'in the usual
>>> way' without having to execute special variants of biocLite() /
>>> install.packages() / funky code in the vignette itself to be able to
>>> build the vignette.
>>>
>>> Loading a package loads its Depends: (and Imports:) so triggers the 
>>> problem.
>>>
>>> Writing separate vignettes would not help with this (but might make the
>>> workflow more palatable; I'm not 100% sure of support for separate work
>>> flows in a single package, there is no problem with having multiple
>>> workflow packages on the same general topic).
>>>
>>> One could move (some?) packages to Suggests: and use your trick of
>>> unloading packages part-way through the vignette. But then users will
>>> find that they need to install packages to complete the vignette.
>>>
>>> 'We' could add a support for a BBS option that increases R_MAX_NUM_DLLS,
>>> but that would allow the workflow to build on the build system, but not
>>> on the users' system.
>>>
>>> I think also the R-core approach to this
>>> (https://stat.ethz.ch/pipermail/r-devel/2016-December/073529.html,
>>> https://github.com/wch/r-source/commit/757bfa1d7ff373a604d6d34617f9cad78e0c875e) 
>>>
>>> is a little insightful, where one could imagine increasing the default
>>> R_MAX_NUM_DLLS, but apparently on some OS these compete for number of
>>> open files, and this in turn can be quite low.
>>>
>>> I note that users have already struggled with the DLL problem 'in the
>>> wild' https://stackoverflow.com/a/45552926/547331. This seems
>>> particularly problematic for workflows, which are appealing to
>>> relatively novice users.
>>>
>>> At the end of the day I think the workflows should make realistic use of
>>> R resources. I think this means modifying the workflow to use fewer
>>> DLLs. (this general comment is relevant to other workflows, which for
>>> instance start by downloading very large data sets -- I know that less
>>> constrained use of computing resources is supposed to be a selling point
>>> of the workflows, but in excess this seems counter-productive to their
>>> primary use as pedagogic tools [rather than, for instance, comprehensive
>>> exemplars of reproducible research]).
>>>
>>> Maybe there is additional discussion about some of the technical aspects
>>> of workflows that others might contribute.
>>>
>>> Martin
>>>
>>>>
>>>>
>>>> Cheers
>>>>
>>>>
>>>> Aaron
>>>>
>>>> ________________________________
>>>> From: Bioc-devel 
>>>> <bioc-devel-bounces at r-project.org<mailto:bioc-devel-bounces at r-project.org>> 
>>>> on behalf of Aaron Lun <alun at wehi.edu.au<mailto:alun at wehi.edu.au>>
>>>> Sent: Wednesday, 21 June 2017 12:09:13 AM
>>>> To: bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>
>>>> Subject: [Untrusted Server]Re: [Bioc-devel] strange error in Jenkins 
>>>> build forsingleCellWorkflow
>>>>
>>>> Hi all,
>>>>
>>>>
>>>> I'm getting a curious error in the Jenkins log when I try to build 
>>>> the singleCellWorkflow:
>>>>
>>>>
>>>> http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/48/label=master/console 
>>>>
>>>>
>>>>
>>>> The key part is at the bottom:
>>>>
>>>>
>>>> Error: package or namespace load failed for 'GenomicFeatures' in 
>>>> dyn.load(file, DLLpath = DLLpath, ...):
>>>>   unable to load shared object 
>>>> '/var/lib/jenkins/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/libs/Rsamtools.so': 
>>>>
>>>>    `maximal number of DLLs reached...
>>>>
>>>>
>>>> The workflow had previously been running fine on the build system; 
>>>> I'm not quite sure what's going on here, given that it's not even 
>>>> failing at the point where I made the latest changes.
>>>>
>>>> Cheers,
>>>>
>>>> Aaron
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>> This email message may contain legally privileged and/or confidential 
>>> information.  If you are not the intended recipient(s), or the 
>>> employee or agent responsible for the delivery of this message to the 
>>> intended recipient(s), you are hereby notified that any disclosure, 
>>> copying, distribution, or use of this email message is prohibited.  
>>> If you have received this message in error, please notify the sender 
>>> immediately by e-mail and delete this email message from your 
>>> computer. Thank you.
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> 
> 
> This email message may contain legally privileged and/or...{{dropped:2}}
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
With thanks in advance-
Wolfgang

-------
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.huber at embl.de
http://www.huber.embl.de



More information about the Bioc-devel mailing list