[Bioc-devel] library() calls removed in simpleSingleCell workflow

Martin Morgan martin.morgan at roswellpark.org
Thu Oct 12 17:23:05 CEST 2017


Tomas Kalibera on R-core says that in R-devel

> I've increased the number of DLLs... Now it is 614 on systems where
> the soft limit on open files allows, but R now attempts to increase
> the limit when needed. If this is not possible, the maximum will be
> smaller. R will fail to start if the maximum could not be at least
> 100 (so users who rely on previous behavior where the default was
> also 100 are fine).
> 
> One can still use the environment variable R_MAX_NUM_DLLS to require 
> a specific maximum. R will try to increase the limit on open files
> if needed. But if not possible, R will fail to start with an error 
> (which is the same behavior as before the change).
> 
> I tested on Linux, macOS, Solaris and Windows. On the macOS and > Solaris systems I use, the default soft limit is 256, but R will
 > increase it to 1024 and so could load up to 614 DLLs.

It would be great if people gave this a whirl; note that there are not 
currently Bioc binary builds to officially support R-devel yet.

Martin

On 10/06/2017 04:49 PM, Henrik Bengtsson wrote:
> I haven't tried (= had to do it) myself, so I don't know exactly what
> it takes, but you can configure this "ulimit" of number of open 
> files, e.g. instructions in 
> https://stackoverflow.com/a/34645/1072091.  I suspect it requires 
> admin rights, but I'm not sure - maybe this is what goes on when you 
> run it in different types terminals.
> 
> About this open file/DLL limit: in src/main/Rdynload.c 
> (https://github.com/wch/r-source/blob/tags/R-3-4-2/src/main/Rdynload.c#L173-L180)
>
>
> 
there's the following comment/clarification:
> 
> /* Note that it is likely that dlopen will use up at least one file 
> descriptor for each DLL loaded (it may load further dynamically 
> linked libraries), so we do not want to get close to the fd limit 
> (which may be as low as 256). By default, the maximum number of DLLs
>  that can be loaded is 100. When the fd limit is known, we allow 
> increasing the maximum number of DLLs via environment variable up to
>  60% of the limit on open files, but to no more than 1000. g */
> 
> I always thought that "as low as 256" was for some archaic system, 
> but, as Wolfgang points out, it's a relevant limit.  Since 0.6*256 =
>  153, this explains that the choice of the current default of a 
> maximum 100 DLLs is reasonable and requests to bump it up much
> higher may not be feasible (not cross-platform).
> 
> 
> Related to this - "Garbage collection of DLLs":
> 
> I've implemented R.utils::gcDLLs() that "Identifies and removes 
> ["stray"] DLLs of packages already unloaded".  This function will 
> free up DLL slots otherwise occupied by unloaded packages.  I've
> used is successfully in many places, e.g. trying to load and unload
> all my installed packages in a single R session (don't ask why ;)).
> 
> However, as argued by Karl Millar 
> (https://stat.ethz.ch/pipermail/r-devel/2016-December/073528.html), 
> there is a risk that unregistering such DLLs may render the state of 
> R unstable because we cannot know for sure whether there are some 
> registered finalizers that rely on such DLLs that yet haven't been 
> called.  R.utils::gcDLLs() forces the garbage collector to run prior
>  to unregistering DLLs, which should eliminate the risk for this 
> problem.  As far as I understand the current R implementation, this 
> should be enough.  On the other hand, I've been wrong before, I don't
> know about future version of R, and it has only been tested so much.
> Guaranteeing reentrancy of finalizers is really tricky.
> 
> /Henrik
> 
> On Fri, Oct 6, 2017 at 10:16 AM, Wolfgang Huber 
> <wolfgang.huber at embl.de> wrote:
>> Interesting! In iTerm2, I get $ ulimit -Sn 4864
>> 
>> and env R_MAX_NUM_DLLS=1000 R
>> 
>> works, which means that on Mac it IS possible to have many more 
>> DLLs open than 100 if R is started in the right way.
>> 
>> Wolfgang
>> 
>> PS I meant OS X 10.12.6, too. SOrry for the typo.
>> 
>> 
>> 6.10.17 14:50, Kasper Daniel Hansen scripsit:
>>> 
>>> On OS X 10.12.6 (I don't think 10.12.16 exists), I get
>>> 
>>> $ ulimit -Sn 7168
>>> 
>>> Interestingly, this is because I use iTerm2 for my command line 
>>> prompt. If I do the same command in Terminal I get 256.  If I 
>>> start R inside of Emacs I get 256 as well.  I don't know
>>> anything about ulimit and how it is set, but that is a pretty
>>> start difference.
>>> 
>>> Best, Kasper
>>> 
>>> 
>>> 
>>> On Fri, Oct 6, 2017 at 3:12 AM, Wolfgang Huber 
>>> <wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>> wrote:
>>> 
>>> On Mac OSX 10.12.16: $ ulimit -Sn 256
>>> 
>>> so the maximum value of R_MAX_NUM_DLLS is 153 ...
>>> 
>>> Wolfgang
>>> 
>>> 5.10.17 23:02, Henrik Bengtsson scripsit:
>>> 
>>> About the DLL limit:
>>> 
>>> Just wanna make sure you're aware of "new" environment variable 
>>> R_MAX_NUM_DLLS available in R (>= 3.4.0).  It allows you to push
>>>  the current default limit of 100 open DLLs a bit higher.  It
>>> can be set in .Renviron or before, e.g.
>>> 
>>> $ R_MAX_NUM_DLLS=500 R
>>> 
>>> This, of course, assumes that you can set it, which you might not
>>> be able to do on build servers.  Also, there is an upper limit
>>> min(0.6*fd_limit,1000) that depends on the number of files you
>>> can have open at the same time (fd_limit), e.g. on my Ubuntu 
>>> 16.04 I've got:
>>> 
>>> $ ulimit -Sn 1024
>>> 
>>> so R_MAX_NUM_DLLS=614 is the maximum for me.
>>> 
>>> /Henrik
>>> 
>>> On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber 
>>> <wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>> wrote:
>>> 
>>> 
>>> Breaking up long workflows into several smaller "modules" each 
>>> with a clearly defined input and output is a good idea, certainly
>>> for didactic & maintenance reasons.
>>> 
>>> It doesn't "solve" the DLL issue though, it only avoids it (for 
>>> now)...
>>> 
>>> I believe you can use a Makefile for your vignettes
>>> 
>>> (https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes
>>>
>>>
>>>
>>> 
<https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes>),
>>> and this might be a good way of managing which depends on which. 
>>> For passing along output/input, perhaps local .RData files are 
>>> good enough, perhaps some wheel-reinventing can also be avoided 
>>> by using
>>> 
>>> https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
>>>
>>>
>>>
>>> 
<https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html>
>>> (haven't actually used it yet, though).
>>> 
>>> Wolfgang
>>> 
>>> 
>>> 
>>> 5.10.17 20:02, Aaron Lun scripsit:
>>> 
>>> 
>>> This may relate to what I was thinking with respect to solving 
>>> the DLL problem, by breaking up large workflows into modules
>>> that can be executed in separate R sessions. The same approach
>>> would also make it easier to associate package dependencies with 
>>> specific parts of the workflow.
>>> 
>>> 
>>> In my particular situation, it is easy to break up the workflow 
>>> into sections that can be executed completely independently. 
>>> However, I can also imagine situations where dependencies on 
>>> previous objects, etc. make it difficult to break up the 
>>> workflow. If multiple files are present in vignettes/, can they 
>>> be directed to execute in a specific order, and would output 
>>> files from one vignette persist during the execution of another?
>>> 
>>> 
>>> -Aaron
>>> 
>>> 
>>> ------------------------------------------------------------------------
>>>
>>>
>>> 
*From:* Wolfgang Huber <wolfgang.huber at embl.de
>>> <mailto:wolfgang.huber at embl.de>> *Sent:* Thursday, 5 October
>>> 2017 6:23:47 PM *To:* Laurent Gatto; Aaron Lun *Cc:* 
>>> bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>
>>> 
>>> *Subject:* Re: [Bioc-devel] library() calls removed in 
>>> simpleSingleCell workflow
>>> 
>>> 
>>> I agree it is nice to be able to only load the packages needed 
>>> for a certain section of a vignette and not the whole thing. And 
>>> that too many `::` can make code look unwieldy (though some may 
>>> actually increase readability).
>>> 
>>> But relying on manually sprinkled in `library` calls seems like
>>> a hack prone to error. And there are always bound to be 
>>> dependencies that are non-local, e.g. on general infrastructure 
>>> like SummarizedExperiment, ggplot2, dplyr.
>>> 
>>> So: do we need a way to computationally determine the 
>>> dependencies of a vignette section, including 
>>> highlighting/eliminating potential name clashes (b/c the
>>> warnings about masking emitted at package loading are easily
>>> ignored)? This seems like a straightforward engineering task.
>>> 
>>> Eventually with such code analysis we could get rid of explicit 
>>> `library` calls altogether :)
>>> 
>>> Wolfgang
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 5.10.17 08:53, Laurent Gatto scripsit:
>>> 
>>> 
>>> 
>>> On  5 October 2017 00:11, Aaron Lun wrote:
>>> 
>>> Here's another two cents from me:
>>> 
>>> The explicit library() calls allow for easy copy-pasting if 
>>> people only want to use/adapt a section of the workflow. In such 
>>> cases, calling "library(simpleSingleCell)" could drag in a lot
>>> of unnecessary packages (e.g., which could hit the DLL limit). 
>>> Reading through the text to figure out the requirements for each
>>>  code chunk seems like a pain, and lots of "::" are unwieldy.
>>> 
>>> More generally, the removal of individual library() calls seems 
>>> to encourage the use of a single "library(simpleSingleCell)"
>>> call at the top of any user-developed custom analysis scripts
>>> based on the workflow. This seems conceptually odd to me - the 
>>> simpleSingleCell package is simply a vehicle for the compiled 
>>> workflow, it shouldn't be involved in analyses of other data.
>>> 
>>> 
>>> 
>>> I can confirm that this is a possibility.
>>> 
>>> Before workflows became available, I created the RforProteomics 
>>> package that essentially provided one relatively large vignette 
>>> to demonstrate a variety of applications of R/Bioconductor for 
>>> mass spectrometry and proteomics. I think this has been a useful 
>>> way to disseminate R and Bioconductor in these respective 
>>> communities, but also lead to the confusion that it was that 
>>> package that "did all the stuff", i.e. people saying that they 
>>> were using RforProteomics to do a task that was described in the 
>>> vignette. The RforProteomics vignette does explicitly call 
>>> library at the beginning of each section and explained that the 
>>> package was only a collection of analyses stemming from other 
>>> packages, but that wasn't enough apparently.
>>> 
>>> Laurent
>>> 
>>> 
>>> -Aaron
>>> 
>>> ________________________________ From: Bioc-devel 
>>> <bioc-devel-bounces at r-project.org 
>>> <mailto:bioc-devel-bounces at r-project.org>> on behalf of Wolfgang 
>>> Huber <wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>> 
>>> Sent: Thursday, 5 October 2017 8:26 AM To: 
>>> bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>
>>> 
>>> Subject: Re: [Bioc-devel] library() calls removed in 
>>> simpleSingleCell workflow
>>> 
>>> 
>>> I find `eval=FALSE` chunks not a good idea, since - they confuse 
>>> users who only see the rendered HTML/PDF (where this flag is not 
>>> shown) - they are not tested, so more prone to code rot.
>>> 
>>> I'd also like to object to the idea that proximity of a
>>> `library` call to code that uses a package is somehow didactic.
>>> It's actually a bad habit: the R interpreter does not care. The 
>>> relevant package - can be mentioned in the narrative, - stated
>>> in the code with the pkgname:: prefix. The latter is good
>>> didactics to get people used to the idea of namespaces,
>>> especially since there is an increasing frequency of name clashes
>>> in CRAN, tidyverse, BioC (e.g. consider the various functions
>>> named 'filter' and the obscure malbehaviors that can result from 
>>> these).
>>> 
>>> Best wishes Wolfgang
>>> 
>>> On 04/10/2017 22:20, Turaga, Nitesh wrote:
>>> 
>>> 
>>> Hi Aaron,
>>> 
>>> 
>>> A work around solution maybe to, put all libraries in a 
>>> “eval=FALSE” block in the r code chunk
>>> 
>>> ```{r, eval=FALSE} library(scran) library(scater) ```
>>> 
>>> etc.
>>> 
>>> 
>>> This way the users can see the library() calls in the vignette.
>>> 
>>> Best,
>>> 
>>> Nitesh
>>> 
>>> On Oct 4, 2017, at 4:14 PM, Obenchain, Valerie 
>>> <Valerie.Obenchain at RoswellPark.org> wrote:
>>> 
>>> Hi guys,
>>> 
>>> A little background on this vignette -> package conversion. The 
>>> workflows were converted to package form because we want to 
>>> integrate them into the nightly build system instead of 
>>> supporting separate machines as we're now doing.
>>> 
>>> As part of this conversion, packages loaded in workflow
>>> vignettes were moved to Depends in DESCRIPTION. This enables the
>>> user to load a single package instead of many. Packages were
>>> moved to Depends instead of Suggests (as is usually done with
>>> software packages) because these vignette is the only thing
>>> these workflow
>>> 
>>> 
>>> packages have going - no defined classes or methods. This seemed 
>>> a more tidy approach and the dependencies are listed in Depends 
>>> for the user to see. This was my (maybe bad?) idea and Nitesh
>>> was the messenger. If you feel the individual loading of packages
>>> in the vignette is a key part of the instruction/learning we can 
>>> leave them as is and list the packages in Suggests.
>>> 
>>> 
>>> 
>>> I should also mention that incorporating the workflows into the 
>>> build system won't happen until after the release. At that time 
>>> we'll move the repositories from svn to git and it's likely
>>> we'll have to ask maintainers to abide by some time/space
>>> guidelines. At that point the build machines will be building
>>> software,
>>> 
>>> 
>>> experimental data and workflows and resources aren't unlimited. 
>>> When that time comes we'll update the workflow guidelines and 
>>> contact maintainers.
>>> 
>>> 
>>> 
>>> Thanks. Valerie
>>> 
>>> 
>>> 
>>> On 10/04/2017 12:27 PM, Kasper Daniel Hansen wrote:
>>> 
>>> yeah, that is super super useful to people. In my vignettes 
>>> (granted, not workflows) I have a separate "Dependencies"
>>> section which is basically a series of library() calls.
>>> 
>>> On Wed, Oct 4, 2017 at 3:18 PM, Aaron Lun <alun at wehi.edu.au
>>> 
>>> <mailto:alun at wehi.edu.au>><mailto:alun at wehi.edu.au 
>>> <mailto:alun at wehi.edu.au>> wrote:
>>> 
>>> 
>>> 
>>> Dear Nitesh, list;
>>> 
>>> 
>>> The library() calls in the simpleSingleCell workflow have been 
>>> removed. Why is this? I find explicit library() calls to be
>>> quite useful for readers of the compiled vignette, because it
>>> makes it easier for them to determine the packages that are
>>> required to adapt parts of the workflow for their own analyses.
>>> If it doesn't hurt the build system, I would prefer to have these
>>> library() calls in the vignette.
>>> 
>>> 
>>> Cheers,
>>> 
>>> 
>>> Aaron
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> 
>>> _______________________________________________ 
>>> Bioc-devel at r-project.org
>>> 
>>> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
>>>
>>>
>>> 
<mailto:Bioc-devel at r-project.org>>
>>> mailing list
>>> 
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> 
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> 
>>> 
>>> 
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> 
>>> _______________________________________________ 
>>> Bioc-devel at r-project.org
>>> 
>>> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
>>>
>>>
>>> 
<mailto:Bioc-devel at r-project.org>>
>>> mailing list
>>> 
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> 
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> This email message may contain legally privileged and/or 
>>> confidential information.  If you are not the intended 
>>> recipient(s), or the employee or agent responsible for the 
>>> delivery of this message to the intended recipient(s), you are 
>>> hereby notified that  any disclosure, copying, distribution, or 
>>> use of this email message is
>>> 
>>> 
>>> prohibited.  If you have received this message in error, please 
>>> notify the sender immediately by e-mail and delete this email 
>>> message from your computer. Thank you.
>>> 
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> 
>>> _______________________________________________ 
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> 
>>> mailing list
>>> 
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> 
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> 
>>> 
>>> Bioc-devel Info Page - ETH
>>> 
>>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> 
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> stat.ethz.ch 
>>> <http://stat.ethz.ch> Your email address: Your name (optional): 
>>> You may enter a privacy password below. This provides only mild 
>>> security, but should prevent others from messing with ...
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> This email message may contain legally privileged and/or 
>>> confidential information.  If you are not the intended 
>>> recipient(s), or the employee or agent responsible for the 
>>> delivery of this message to the intended recipient(s), you are 
>>> hereby notified that any disclosure, copying, distribution, or 
>>> use of this email message is
>>> 
>>> 
>>> prohibited.  If you have received this message in error, please 
>>> notify the sender immediately by e-mail and delete this email 
>>> message from your computer. Thank you.
>>> 
>>> 
>>> 
>>> _______________________________________________ 
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> 
>>> mailing list
>>> 
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> 
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> 
>>> 
>>> Bioc-devel Info Page - ETH
>>> 
>>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> 
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> stat.ethz.ch 
>>> <http://stat.ethz.ch> Your email address: Your name (optional): 
>>> You may enter a privacy password below. This provides only mild 
>>> security, but should prevent others from messing with ...
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- With thanks in advance- Wolfgang
>>> 
>>> ------- Wolfgang Huber Principal Investigator, EMBL Senior 
>>> Scientist European Molecular Biology Laboratory (EMBL) 
>>> Heidelberg, Germany
>>> 
>>> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de> 
>>> http://www.huber.embl.de
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- With thanks in advance- Wolfgang
>>> 
>>> ------- Wolfgang Huber Principal Investigator, EMBL Senior 
>>> Scientist European Molecular Biology Laboratory (EMBL) 
>>> Heidelberg, Germany
>>> 
>>> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de> 
>>> http://www.huber.embl.de
>>> 
>>> _______________________________________________ 
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> 
>>> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel 
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> 
>>> 
>>> --     With thanks in advance- Wolfgang
>>> 
>>> ------- Wolfgang Huber Principal Investigator, EMBL Senior 
>>> Scientist European Molecular Biology Laboratory (EMBL) 
>>> Heidelberg, Germany
>>> 
>>> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de> 
>>> http://www.huber.embl.de
>>> 
>>> _______________________________________________ 
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> 
>>> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel 
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> 
>>> 
>> 
>> -- With thanks in advance- Wolfgang
>> 
>> ------- Wolfgang Huber Principal Investigator, EMBL Senior 
>> Scientist European Molecular Biology Laboratory (EMBL) Heidelberg, 
>> Germany
>> 
>> wolfgang.huber at embl.de http://www.huber.embl.de
>> 
>> _______________________________________________ 
>> Bioc-devel at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> _______________________________________________ 
> Bioc-devel at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 


This email message may contain legally privileged and/or...{{dropped:2}}



More information about the Bioc-devel mailing list