[Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re: strange error in Jenkins build forsingleCellWorkflow

Hervé Pagès hpages at fredhutch.org
Thu Sep 21 07:06:18 CEST 2017


Hi,

@Martin: It's good news that the workflows have been standardized as
packages but aren't we still using the traditional workflow builder?
AFAIK .BBSoptions files are only honoured on the main build system
(a.k.a. BBS).

@Aaron: If we decide to use BBS (our main build system) to build the
workflows, then you'll be able to control R_MAX_NUM_DLLS by putting
the following lines to your .BBSoptions file:

RbuildPrepend: R_MAX_NUM_DLLS=150
RbuildPrepend.win: set R_MAX_NUM_DLLS=150&&
RcheckPrepend: R_MAX_NUM_DLLS=150
RcheckPrepend.win: set R_MAX_NUM_DLLS=150&&

You might not need all of them but it doesn't hurt to have them
all. Note that you should not try to put a space before && in the
RbuildPrepend.win or RcheckPrepend.win value.

H.

On 09/19/2017 05:51 PM, Aaron Lun wrote:
> Thanks Martin. I think I will stick to one workflow for now, until the
> BioC-workflows page provides some formal support for multiple workflows
> representing different components of the same workflow (i.e., other than
> me manually writing in the abstract that "This workflow is based on the
> concepts introduced in the previous workflow X").
>
>
> @Herve can you help me out with the .BBSoptions configuration for
> R_MAX_NUM_DLLS? I guess we should also indicate to the user that this
> needs to be increased in order for the workflow to run.
>
>
> -Aaron
>
>
>
> ------------------------------------------------------------------------
> *From:* Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of
> Martin Morgan <martin.morgan at roswellpark.org>
> *Sent:* Wednesday, 20 September 2017 2:16 AM
> *To:* Wolfgang Huber; bioc-devel at r-project.org
> *Subject:* Re: [Bioc-devel] [Untrusted Server]Re: [Untrusted Server]Re:
> strange error in Jenkins build forsingleCellWorkflow
> On 09/19/2017 09:50 AM, Wolfgang Huber wrote:
>>
>> My 3 cents:
>> - I think this is a more and more common problem that I'm also
>> encountering in everyday work and that asks for a general solution.
>> - I agree with Martin that setting R_MAX_NUM_DLLS is better than
>> unloading. AfaIk it is not even possible to cleanly unload every package
>> ('as if it had never been loaded') due to irreversible global effects;
>> although I'd happy to be educated otherwise.
>> - R_MAX_NUM_DLLS is not a sustainable solution either: the current
>> default is 100, but e.g. on my MacOS 10.12 any value >152 leads to an
>> error. Upping to the maximum 152 will give us some temporary respite but
>> seems not really future-proof.
>
> This was the R-core motivation for increasing the max to only 100, but
> it's still surprising to me that a modern OS has such a tight limit.
> I'll see if there are ideas in R-core.
>
>   From our internal discussions there is some willingness to (continue)
> supporting large and complicated work flows, but it is valuable to think
> carefully about the consequences for users following along. Maybe part
> of this is clearly alerting the user to the fact that 500G of data are
> going to be downloaded, the workflow requires advanced configuration of
> R, etc.
>
> @Aaron -- if you'd like to continue with one work flow, contact Herve
> (cc'd) and he'll provide the .BBSoptions configuration to allow the
> build system to use an appropriate R_MAX_NUM_DLLS. If instead you'd like
> to produce two workflows, then the best strategy in your case would be
> to simply have two independent packages (DESCRIPTION + vignettes/) each
> with more modest numbers of DLLs; contact Lori (cc'd) when you've
> decided on a second name, and we'll create the svn location for you.
>
> Martin
>
>>
>>      Wolfgang
>>
>> 19.9.17 12:02, Martin Morgan scripsit:
>>> On 09/18/2017 10:42 PM, Shian Su wrote:
>>>> Hi Aaron,
>>>>
>>>> Would you mind sharing the code for flushing DLLs? This is a problem
>>>> that others working with single cells and I have faced.
>>>>
>>>
>>> For the user encountering this problem I think a better solution is to
>>> increase the number of DLLs allowed by R, for instance editing
>>> .Renviron to contain the line
>>>
>>> R_MAX_NUM_DLLS=120
>>>
>>> or similar. This can be on an installation-wide, user-wise, or
>>> project-specific basis, as described in ?Startup
>>>
>>> @Aaron -- we are still discussing things internally; for instance it
>>> is possible to set the maximum number of DLLs in the build system.
>>>
>>> Martin
>>>
>>>> Better yet would anyone know of code that would allow unused DLL to
>>>> be identified and unloaded? I suspect not as it would require keeping
>>>> track of the dependency tree of your current environment but I’m
>>>> hopeful.
>>>>
>>>> Kind regards,
>>>> Shian Su
>>>>
>>>>> On 19 Sep 2017, at 12:30 pm, Aaron Lun <alun at wehi.edu.au> wrote:
>>>>>
>>>>> Well, inertia won out in the end, and so I've just moved a whole
>>>>> stack of packages into "Suggests" for now. This is probably not a
>>>>> sustainable solution as the workflow can potentially get larger over
>>>>> time; I would prefer to have some formal support for splitting up
>>>>> the workflow into modules that can be independently installed.
>>>>>
>>>>> -Aaron
>>>>> ________________________________
>>>>> From: Vincent Carey <stvjc at channing.harvard.edu>
>>>>> Sent: Saturday, 16 September 2017 10:08:13 PM
>>>>> To: Aaron Lun
>>>>> Cc: Martin Morgan; bioc-devel at r-project.org
>>>>> Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in
>>>>> Jenkins build forsingleCellWorkflow
>>>>>
>>>>> IMHO the pedagogic value of a unified document that treats a topic
>>>>> thoroughly
>>>>> is quite high.  Building the whole workflow on an arbitrary user's
>>>>> system seems to
>>>>> me to be a lower priority.  Thus using the environment variable in
>>>>> the build system
>>>>> to avoid this limit seems an appropriate solution.
>>>>>
>>>>> On Sat, Sep 16, 2017 at 7:43 AM, Aaron Lun
>>>>> <alun at wehi.edu.au<mailto:alun at wehi.edu.au>> wrote:
>>>>> Thanks Martin. Yes, it's quite unfortunate that scater drags in
>>>>> dplyr and ggplot2, which - combined with Bioconductor's core
>>>>> packages - already puts us pretty close to the limit without doing
>>>>> anything else!
>>>>>
>>>>>
>>>>> A solution might be to split my workflow into self-contained
>>>>> components, each of which can become its own workflow package (e.g.,
>>>>> simpleSingleCell1, simpleSingleCell2, simpleSingleCell3 and so on).
>>>>> This should avoid all of the problems and our associated hacks.
>>>>>
>>>>>
>>>>> I'm happy to do this, but is it possible for the website to indicate
>>>>> that there is a connection between the component workflows? For
>>>>> example, the link that ordinarily goes to the compiled workflow
>>>>> could instead go to an indexing page, which contains links to
>>>>> individual component workflows.
>>>>>
>>>>>
>>>>> -Aaron
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Martin Morgan
>>>>> <martin.morgan at roswellpark.org<mailto:martin.morgan at roswellpark.org>>
>>>>> Sent: Saturday, 16 September 2017 8:18:09 PM
>>>>> To: Aaron Lun;
>>>>> bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>
>>>>> Subject: Re: [Bioc-devel] [Untrusted Server]Re: strange error in
>>>>> Jenkins build forsingleCellWorkflow
>>>>>
>>>>> On 09/16/2017 01:53 AM, Aaron Lun wrote:
>>>>>> Bumping this rather old thread. To re-iterate, I'm updating my
>>>>>> simpleSingleCell workflow and I'm running into R's DLL limit. I've
>>>>>> added a code block halfway through the workflow that unloads all
>>>>>> DLLs and cleans them out, and this works fine during compilation on
>>>>>> my local machine.
>>>>>>
>>>>>>
>>>>>> However, it seems that the BioC workflow builder uses a
>>>>>> pre-processing step whereby it first tries to load all packages
>>>>>> contained within library() calls. This hits the DLL limit as it
>>>>>> doesn't execute the protective code block, which defeats the
>>>>>> purpose of all my fiddling in the first place.
>>>>>>
>>>>>>
>>>>>> What options are there? I'm happy to split my workflow into
>>>>>> multiple smaller Rmarkdown files that get compiled separately,
>>>>>> provided there is appropriate support for this setup from the build
>>>>>> system
>>>>>
>>>>> The workflows have been standardized as packages. The packages put the
>>>>> workflow dependencies in the 'Depends:' field, with the idea being that
>>>>> the user installing the workflow package 'in the usual way' will get
>>>>> the
>>>>> packages used in the vignette installed in their system 'in the usual
>>>>> way' without having to execute special variants of biocLite() /
>>>>> install.packages() / funky code in the vignette itself to be able to
>>>>> build the vignette.
>>>>>
>>>>> Loading a package loads its Depends: (and Imports:) so triggers the
>>>>> problem.
>>>>>
>>>>> Writing separate vignettes would not help with this (but might make the
>>>>> workflow more palatable; I'm not 100% sure of support for separate work
>>>>> flows in a single package, there is no problem with having multiple
>>>>> workflow packages on the same general topic).
>>>>>
>>>>> One could move (some?) packages to Suggests: and use your trick of
>>>>> unloading packages part-way through the vignette. But then users will
>>>>> find that they need to install packages to complete the vignette.
>>>>>
>>>>> 'We' could add a support for a BBS option that increases
>>>>> R_MAX_NUM_DLLS,
>>>>> but that would allow the workflow to build on the build system, but not
>>>>> on the users' system.
>>>>>
>>>>> I think also the R-core approach to this
>>>>> (https://stat.ethz.ch/pipermail/r-devel/2016-December/073529.html,
>>>>>https://github.com/wch/r-source/commit/757bfa1d7ff373a604d6d34617f9cad78e0c875e
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_wch_r-2Dsource_commit_757bfa1d7ff373a604d6d34617f9cad78e0c875e&d=DwMFEA&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wbZGwxgJ7vc_2EjT6t3tlmN3HOB8koZjSWG1bhJaso0&s=hWib1RRxLYfpoHR_GROWJ26var56HcJRnNGB1cj25J8&e=>)
>
>>>>>
>>>>> is a little insightful, where one could imagine increasing the default
>>>>> R_MAX_NUM_DLLS, but apparently on some OS these compete for number of
>>>>> open files, and this in turn can be quite low.
>>>>>
>>>>> I note that users have already struggled with the DLL problem 'in the
>>>>> wild'https://stackoverflow.com/a/45552926/547331
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_a_45552926_547331&d=DwMFEA&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wbZGwxgJ7vc_2EjT6t3tlmN3HOB8koZjSWG1bhJaso0&s=l2gtLudMs8ZthtaIFk7n7Bb7QvaLQHCIcWDWT-jRLJY&e=>.
> This seems
>>>>> particularly problematic for workflows, which are appealing to
>>>>> relatively novice users.
>>>>>
>>>>> At the end of the day I think the workflows should make realistic
>>>>> use of
>>>>> R resources. I think this means modifying the workflow to use fewer
>>>>> DLLs. (this general comment is relevant to other workflows, which for
>>>>> instance start by downloading very large data sets -- I know that less
>>>>> constrained use of computing resources is supposed to be a selling
>>>>> point
>>>>> of the workflows, but in excess this seems counter-productive to their
>>>>> primary use as pedagogic tools [rather than, for instance,
>>>>> comprehensive
>>>>> exemplars of reproducible research]).
>>>>>
>>>>> Maybe there is additional discussion about some of the technical
>>>>> aspects
>>>>> of workflows that others might contribute.
>>>>>
>>>>> Martin
>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>> Aaron
>>>>>>
>>>>>> ________________________________
>>>>>> From: Bioc-devel
>>>>>> <bioc-devel-bounces at r-project.org<mailto:bioc-devel-bounces at r-project.org>>
>>>>>> on behalf of Aaron Lun <alun at wehi.edu.au<mailto:alun at wehi.edu.au>>
>>>>>> Sent: Wednesday, 21 June 2017 12:09:13 AM
>>>>>> To: bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>
>>>>>> Subject: [Untrusted Server]Re: [Bioc-devel] strange error in
>>>>>> Jenkins build forsingleCellWorkflow
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>>
>>>>>> I'm getting a curious error in the Jenkins log when I try to build
>>>>>> the singleCellWorkflow:
>>>>>>
>>>>>>
>>>>>>http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/48/label=master/console
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__docbuilder.bioconductor.org-3A8080_job_simpleSingleCell_48_label-3Dmaster_console&d=DwMFEA&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wbZGwxgJ7vc_2EjT6t3tlmN3HOB8koZjSWG1bhJaso0&s=RswvfSl6whS1FwPPojy-aqHFraiNpmUhkRN5t-MGpL4&e=>
>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The key part is at the bottom:
>>>>>>
>>>>>>
>>>>>> Error: package or namespace load failed for 'GenomicFeatures' in
>>>>>> dyn.load(file, DLLpath = DLLpath, ...):
>>>>>>   unable to load shared object
>>>>>> '/var/lib/jenkins/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/libs/Rsamtools.so':
>>>>>>
>>>>>>    `maximal number of DLLs reached...
>>>>>>
>>>>>>
>>>>>> The workflow had previously been running fine on the build system;
>>>>>> I'm not quite sure what's going on here, given that it's not even
>>>>>> failing at the point where I made the latest changes.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Aaron
>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>>>>https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFEA&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wbZGwxgJ7vc_2EjT6t3tlmN3HOB8koZjSWG1bhJaso0&s=h3K_hFGpne-7mRXJe_epyAop1mQi_0q-ld8a0aCyVSg&e=>
>>>>>>
>>>>>>        [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>>>>https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFEA&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wbZGwxgJ7vc_2EjT6t3tlmN3HOB8koZjSWG1bhJaso0&s=h3K_hFGpne-7mRXJe_epyAop1mQi_0q-ld8a0aCyVSg&e=>
>>>>>>
>>>>>
>>>>>
>>>>> This email message may contain legally privileged and/or
>>>>> confidential information.  If you are not the intended recipient(s),
>>>>> or the employee or agent responsible for the delivery of this
>>>>> message to the intended recipient(s), you are hereby notified that
>>>>> any disclosure, copying, distribution, or use of this email message
>>>>> is prohibited. If you have received this message in error, please
>>>>> notify the sender immediately by e-mail and delete this email
>>>>> message from your computer. Thank you.
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>>>https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFEA&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wbZGwxgJ7vc_2EjT6t3tlmN3HOB8koZjSWG1bhJaso0&s=h3K_hFGpne-7mRXJe_epyAop1mQi_0q-ld8a0aCyVSg&e=>
>>>>>
>>>>>
>>>>>     [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>>https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFEA&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wbZGwxgJ7vc_2EjT6t3tlmN3HOB8koZjSWG1bhJaso0&s=h3K_hFGpne-7mRXJe_epyAop1mQi_0q-ld8a0aCyVSg&e=>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFEA&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wbZGwxgJ7vc_2EjT6t3tlmN3HOB8koZjSWG1bhJaso0&s=h3K_hFGpne-7mRXJe_epyAop1mQi_0q-ld8a0aCyVSg&e=>
>>>>
>>>
>>>
>>> This email message may contain legally privileged and/or...{{dropped:2}}
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>>https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFEA&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wbZGwxgJ7vc_2EjT6t3tlmN3HOB8koZjSWG1bhJaso0&s=h3K_hFGpne-7mRXJe_epyAop1mQi_0q-ld8a0aCyVSg&e=>
>>
>
>
> This email message may contain legally privileged and/or...{{dropped:2}}
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFEA&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wbZGwxgJ7vc_2EjT6t3tlmN3HOB8koZjSWG1bhJaso0&s=h3K_hFGpne-7mRXJe_epyAop1mQi_0q-ld8a0aCyVSg&e=>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list