[Bioc-devel] library() calls removed in simpleSingleCell workflow
Martin Morgan
martin.morgan at roswellpark.org
Thu Oct 12 17:23:05 CEST 2017
Tomas Kalibera on R-core says that in R-devel
> I've increased the number of DLLs... Now it is 614 on systems where
> the soft limit on open files allows, but R now attempts to increase
> the limit when needed. If this is not possible, the maximum will be
> smaller. R will fail to start if the maximum could not be at least
> 100 (so users who rely on previous behavior where the default was
> also 100 are fine).
>
> One can still use the environment variable R_MAX_NUM_DLLS to require
> a specific maximum. R will try to increase the limit on open files
> if needed. But if not possible, R will fail to start with an error
> (which is the same behavior as before the change).
>
> I tested on Linux, macOS, Solaris and Windows. On the macOS and > Solaris systems I use, the default soft limit is 256, but R will
> increase it to 1024 and so could load up to 614 DLLs.
It would be great if people gave this a whirl; note that there are not
currently Bioc binary builds to officially support R-devel yet.
Martin
On 10/06/2017 04:49 PM, Henrik Bengtsson wrote:
> I haven't tried (= had to do it) myself, so I don't know exactly what
> it takes, but you can configure this "ulimit" of number of open
> files, e.g. instructions in
> https://stackoverflow.com/a/34645/1072091. I suspect it requires
> admin rights, but I'm not sure - maybe this is what goes on when you
> run it in different types terminals.
>
> About this open file/DLL limit: in src/main/Rdynload.c
> (https://github.com/wch/r-source/blob/tags/R-3-4-2/src/main/Rdynload.c#L173-L180)
>
>
>
there's the following comment/clarification:
>
> /* Note that it is likely that dlopen will use up at least one file
> descriptor for each DLL loaded (it may load further dynamically
> linked libraries), so we do not want to get close to the fd limit
> (which may be as low as 256). By default, the maximum number of DLLs
> that can be loaded is 100. When the fd limit is known, we allow
> increasing the maximum number of DLLs via environment variable up to
> 60% of the limit on open files, but to no more than 1000. g */
>
> I always thought that "as low as 256" was for some archaic system,
> but, as Wolfgang points out, it's a relevant limit. Since 0.6*256 =
> 153, this explains that the choice of the current default of a
> maximum 100 DLLs is reasonable and requests to bump it up much
> higher may not be feasible (not cross-platform).
>
>
> Related to this - "Garbage collection of DLLs":
>
> I've implemented R.utils::gcDLLs() that "Identifies and removes
> ["stray"] DLLs of packages already unloaded". This function will
> free up DLL slots otherwise occupied by unloaded packages. I've
> used is successfully in many places, e.g. trying to load and unload
> all my installed packages in a single R session (don't ask why ;)).
>
> However, as argued by Karl Millar
> (https://stat.ethz.ch/pipermail/r-devel/2016-December/073528.html),
> there is a risk that unregistering such DLLs may render the state of
> R unstable because we cannot know for sure whether there are some
> registered finalizers that rely on such DLLs that yet haven't been
> called. R.utils::gcDLLs() forces the garbage collector to run prior
> to unregistering DLLs, which should eliminate the risk for this
> problem. As far as I understand the current R implementation, this
> should be enough. On the other hand, I've been wrong before, I don't
> know about future version of R, and it has only been tested so much.
> Guaranteeing reentrancy of finalizers is really tricky.
>
> /Henrik
>
> On Fri, Oct 6, 2017 at 10:16 AM, Wolfgang Huber
> <wolfgang.huber at embl.de> wrote:
>> Interesting! In iTerm2, I get $ ulimit -Sn 4864
>>
>> and env R_MAX_NUM_DLLS=1000 R
>>
>> works, which means that on Mac it IS possible to have many more
>> DLLs open than 100 if R is started in the right way.
>>
>> Wolfgang
>>
>> PS I meant OS X 10.12.6, too. SOrry for the typo.
>>
>>
>> 6.10.17 14:50, Kasper Daniel Hansen scripsit:
>>>
>>> On OS X 10.12.6 (I don't think 10.12.16 exists), I get
>>>
>>> $ ulimit -Sn 7168
>>>
>>> Interestingly, this is because I use iTerm2 for my command line
>>> prompt. If I do the same command in Terminal I get 256. If I
>>> start R inside of Emacs I get 256 as well. I don't know
>>> anything about ulimit and how it is set, but that is a pretty
>>> start difference.
>>>
>>> Best, Kasper
>>>
>>>
>>>
>>> On Fri, Oct 6, 2017 at 3:12 AM, Wolfgang Huber
>>> <wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>> wrote:
>>>
>>> On Mac OSX 10.12.16: $ ulimit -Sn 256
>>>
>>> so the maximum value of R_MAX_NUM_DLLS is 153 ...
>>>
>>> Wolfgang
>>>
>>> 5.10.17 23:02, Henrik Bengtsson scripsit:
>>>
>>> About the DLL limit:
>>>
>>> Just wanna make sure you're aware of "new" environment variable
>>> R_MAX_NUM_DLLS available in R (>= 3.4.0). It allows you to push
>>> the current default limit of 100 open DLLs a bit higher. It
>>> can be set in .Renviron or before, e.g.
>>>
>>> $ R_MAX_NUM_DLLS=500 R
>>>
>>> This, of course, assumes that you can set it, which you might not
>>> be able to do on build servers. Also, there is an upper limit
>>> min(0.6*fd_limit,1000) that depends on the number of files you
>>> can have open at the same time (fd_limit), e.g. on my Ubuntu
>>> 16.04 I've got:
>>>
>>> $ ulimit -Sn 1024
>>>
>>> so R_MAX_NUM_DLLS=614 is the maximum for me.
>>>
>>> /Henrik
>>>
>>> On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber
>>> <wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>> wrote:
>>>
>>>
>>> Breaking up long workflows into several smaller "modules" each
>>> with a clearly defined input and output is a good idea, certainly
>>> for didactic & maintenance reasons.
>>>
>>> It doesn't "solve" the DLL issue though, it only avoids it (for
>>> now)...
>>>
>>> I believe you can use a Makefile for your vignettes
>>>
>>> (https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes
>>>
>>>
>>>
>>>
<https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes>),
>>> and this might be a good way of managing which depends on which.
>>> For passing along output/input, perhaps local .RData files are
>>> good enough, perhaps some wheel-reinventing can also be avoided
>>> by using
>>>
>>> https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
>>>
>>>
>>>
>>>
<https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html>
>>> (haven't actually used it yet, though).
>>>
>>> Wolfgang
>>>
>>>
>>>
>>> 5.10.17 20:02, Aaron Lun scripsit:
>>>
>>>
>>> This may relate to what I was thinking with respect to solving
>>> the DLL problem, by breaking up large workflows into modules
>>> that can be executed in separate R sessions. The same approach
>>> would also make it easier to associate package dependencies with
>>> specific parts of the workflow.
>>>
>>>
>>> In my particular situation, it is easy to break up the workflow
>>> into sections that can be executed completely independently.
>>> However, I can also imagine situations where dependencies on
>>> previous objects, etc. make it difficult to break up the
>>> workflow. If multiple files are present in vignettes/, can they
>>> be directed to execute in a specific order, and would output
>>> files from one vignette persist during the execution of another?
>>>
>>>
>>> -Aaron
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
*From:* Wolfgang Huber <wolfgang.huber at embl.de
>>> <mailto:wolfgang.huber at embl.de>> *Sent:* Thursday, 5 October
>>> 2017 6:23:47 PM *To:* Laurent Gatto; Aaron Lun *Cc:*
>>> bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>
>>>
>>> *Subject:* Re: [Bioc-devel] library() calls removed in
>>> simpleSingleCell workflow
>>>
>>>
>>> I agree it is nice to be able to only load the packages needed
>>> for a certain section of a vignette and not the whole thing. And
>>> that too many `::` can make code look unwieldy (though some may
>>> actually increase readability).
>>>
>>> But relying on manually sprinkled in `library` calls seems like
>>> a hack prone to error. And there are always bound to be
>>> dependencies that are non-local, e.g. on general infrastructure
>>> like SummarizedExperiment, ggplot2, dplyr.
>>>
>>> So: do we need a way to computationally determine the
>>> dependencies of a vignette section, including
>>> highlighting/eliminating potential name clashes (b/c the
>>> warnings about masking emitted at package loading are easily
>>> ignored)? This seems like a straightforward engineering task.
>>>
>>> Eventually with such code analysis we could get rid of explicit
>>> `library` calls altogether :)
>>>
>>> Wolfgang
>>>
>>>
>>>
>>>
>>>
>>> 5.10.17 08:53, Laurent Gatto scripsit:
>>>
>>>
>>>
>>> On 5 October 2017 00:11, Aaron Lun wrote:
>>>
>>> Here's another two cents from me:
>>>
>>> The explicit library() calls allow for easy copy-pasting if
>>> people only want to use/adapt a section of the workflow. In such
>>> cases, calling "library(simpleSingleCell)" could drag in a lot
>>> of unnecessary packages (e.g., which could hit the DLL limit).
>>> Reading through the text to figure out the requirements for each
>>> code chunk seems like a pain, and lots of "::" are unwieldy.
>>>
>>> More generally, the removal of individual library() calls seems
>>> to encourage the use of a single "library(simpleSingleCell)"
>>> call at the top of any user-developed custom analysis scripts
>>> based on the workflow. This seems conceptually odd to me - the
>>> simpleSingleCell package is simply a vehicle for the compiled
>>> workflow, it shouldn't be involved in analyses of other data.
>>>
>>>
>>>
>>> I can confirm that this is a possibility.
>>>
>>> Before workflows became available, I created the RforProteomics
>>> package that essentially provided one relatively large vignette
>>> to demonstrate a variety of applications of R/Bioconductor for
>>> mass spectrometry and proteomics. I think this has been a useful
>>> way to disseminate R and Bioconductor in these respective
>>> communities, but also lead to the confusion that it was that
>>> package that "did all the stuff", i.e. people saying that they
>>> were using RforProteomics to do a task that was described in the
>>> vignette. The RforProteomics vignette does explicitly call
>>> library at the beginning of each section and explained that the
>>> package was only a collection of analyses stemming from other
>>> packages, but that wasn't enough apparently.
>>>
>>> Laurent
>>>
>>>
>>> -Aaron
>>>
>>> ________________________________ From: Bioc-devel
>>> <bioc-devel-bounces at r-project.org
>>> <mailto:bioc-devel-bounces at r-project.org>> on behalf of Wolfgang
>>> Huber <wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>>
>>> Sent: Thursday, 5 October 2017 8:26 AM To:
>>> bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>
>>>
>>> Subject: Re: [Bioc-devel] library() calls removed in
>>> simpleSingleCell workflow
>>>
>>>
>>> I find `eval=FALSE` chunks not a good idea, since - they confuse
>>> users who only see the rendered HTML/PDF (where this flag is not
>>> shown) - they are not tested, so more prone to code rot.
>>>
>>> I'd also like to object to the idea that proximity of a
>>> `library` call to code that uses a package is somehow didactic.
>>> It's actually a bad habit: the R interpreter does not care. The
>>> relevant package - can be mentioned in the narrative, - stated
>>> in the code with the pkgname:: prefix. The latter is good
>>> didactics to get people used to the idea of namespaces,
>>> especially since there is an increasing frequency of name clashes
>>> in CRAN, tidyverse, BioC (e.g. consider the various functions
>>> named 'filter' and the obscure malbehaviors that can result from
>>> these).
>>>
>>> Best wishes Wolfgang
>>>
>>> On 04/10/2017 22:20, Turaga, Nitesh wrote:
>>>
>>>
>>> Hi Aaron,
>>>
>>>
>>> A work around solution maybe to, put all libraries in a
>>> “eval=FALSE” block in the r code chunk
>>>
>>> ```{r, eval=FALSE} library(scran) library(scater) ```
>>>
>>> etc.
>>>
>>>
>>> This way the users can see the library() calls in the vignette.
>>>
>>> Best,
>>>
>>> Nitesh
>>>
>>> On Oct 4, 2017, at 4:14 PM, Obenchain, Valerie
>>> <Valerie.Obenchain at RoswellPark.org> wrote:
>>>
>>> Hi guys,
>>>
>>> A little background on this vignette -> package conversion. The
>>> workflows were converted to package form because we want to
>>> integrate them into the nightly build system instead of
>>> supporting separate machines as we're now doing.
>>>
>>> As part of this conversion, packages loaded in workflow
>>> vignettes were moved to Depends in DESCRIPTION. This enables the
>>> user to load a single package instead of many. Packages were
>>> moved to Depends instead of Suggests (as is usually done with
>>> software packages) because these vignette is the only thing
>>> these workflow
>>>
>>>
>>> packages have going - no defined classes or methods. This seemed
>>> a more tidy approach and the dependencies are listed in Depends
>>> for the user to see. This was my (maybe bad?) idea and Nitesh
>>> was the messenger. If you feel the individual loading of packages
>>> in the vignette is a key part of the instruction/learning we can
>>> leave them as is and list the packages in Suggests.
>>>
>>>
>>>
>>> I should also mention that incorporating the workflows into the
>>> build system won't happen until after the release. At that time
>>> we'll move the repositories from svn to git and it's likely
>>> we'll have to ask maintainers to abide by some time/space
>>> guidelines. At that point the build machines will be building
>>> software,
>>>
>>>
>>> experimental data and workflows and resources aren't unlimited.
>>> When that time comes we'll update the workflow guidelines and
>>> contact maintainers.
>>>
>>>
>>>
>>> Thanks. Valerie
>>>
>>>
>>>
>>> On 10/04/2017 12:27 PM, Kasper Daniel Hansen wrote:
>>>
>>> yeah, that is super super useful to people. In my vignettes
>>> (granted, not workflows) I have a separate "Dependencies"
>>> section which is basically a series of library() calls.
>>>
>>> On Wed, Oct 4, 2017 at 3:18 PM, Aaron Lun <alun at wehi.edu.au
>>>
>>> <mailto:alun at wehi.edu.au>><mailto:alun at wehi.edu.au
>>> <mailto:alun at wehi.edu.au>> wrote:
>>>
>>>
>>>
>>> Dear Nitesh, list;
>>>
>>>
>>> The library() calls in the simpleSingleCell workflow have been
>>> removed. Why is this? I find explicit library() calls to be
>>> quite useful for readers of the compiled vignette, because it
>>> makes it easier for them to determine the packages that are
>>> required to adapt parts of the workflow for their own analyses.
>>> If it doesn't hurt the build system, I would prefer to have these
>>> library() calls in the vignette.
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Aaron
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org
>>>
>>> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
>>>
>>>
>>>
<mailto:Bioc-devel at r-project.org>>
>>> mailing list
>>>
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org
>>>
>>> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
>>>
>>>
>>>
<mailto:Bioc-devel at r-project.org>>
>>> mailing list
>>>
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>>>
>>>
>>>
>>>
>>> This email message may contain legally privileged and/or
>>> confidential information. If you are not the intended
>>> recipient(s), or the employee or agent responsible for the
>>> delivery of this message to the intended recipient(s), you are
>>> hereby notified that any disclosure, copying, distribution, or
>>> use of this email message is
>>>
>>>
>>> prohibited. If you have received this message in error, please
>>> notify the sender immediately by e-mail and delete this email
>>> message from your computer. Thank you.
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>> mailing list
>>>
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>>>
>>> Bioc-devel Info Page - ETH
>>>
>>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> stat.ethz.ch
>>> <http://stat.ethz.ch> Your email address: Your name (optional):
>>> You may enter a privacy password below. This provides only mild
>>> security, but should prevent others from messing with ...
>>>
>>>
>>>
>>>
>>>
>>>
>>> This email message may contain legally privileged and/or
>>> confidential information. If you are not the intended
>>> recipient(s), or the employee or agent responsible for the
>>> delivery of this message to the intended recipient(s), you are
>>> hereby notified that any disclosure, copying, distribution, or
>>> use of this email message is
>>>
>>>
>>> prohibited. If you have received this message in error, please
>>> notify the sender immediately by e-mail and delete this email
>>> message from your computer. Thank you.
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>> mailing list
>>>
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>>>
>>> Bioc-devel Info Page - ETH
>>>
>>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> stat.ethz.ch
>>> <http://stat.ethz.ch> Your email address: Your name (optional):
>>> You may enter a privacy password below. This provides only mild
>>> security, but should prevent others from messing with ...
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- With thanks in advance- Wolfgang
>>>
>>> ------- Wolfgang Huber Principal Investigator, EMBL Senior
>>> Scientist European Molecular Biology Laboratory (EMBL)
>>> Heidelberg, Germany
>>>
>>> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
>>> http://www.huber.embl.de
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- With thanks in advance- Wolfgang
>>>
>>> ------- Wolfgang Huber Principal Investigator, EMBL Senior
>>> Scientist European Molecular Biology Laboratory (EMBL)
>>> Heidelberg, Germany
>>>
>>> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
>>> http://www.huber.embl.de
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>>>
>>> -- With thanks in advance- Wolfgang
>>>
>>> ------- Wolfgang Huber Principal Investigator, EMBL Senior
>>> Scientist European Molecular Biology Laboratory (EMBL)
>>> Heidelberg, Germany
>>>
>>> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
>>> http://www.huber.embl.de
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>>>
>>
>> -- With thanks in advance- Wolfgang
>>
>> ------- Wolfgang Huber Principal Investigator, EMBL Senior
>> Scientist European Molecular Biology Laboratory (EMBL) Heidelberg,
>> Germany
>>
>> wolfgang.huber at embl.de http://www.huber.embl.de
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
This email message may contain legally privileged and/or...{{dropped:2}}
More information about the Bioc-devel
mailing list