[Bioc-devel] library() calls removed in simpleSingleCell workflow
Henrik Bengtsson
henrik.bengtsson at gmail.com
Fri Oct 6 22:49:57 CEST 2017
I haven't tried (= had to do it) myself, so I don't know exactly what
it takes, but you can configure this "ulimit" of number of open files,
e.g. instructions in https://stackoverflow.com/a/34645/1072091. I
suspect it requires admin rights, but I'm not sure - maybe this is
what goes on when you run it in different types terminals.
About this open file/DLL limit: in src/main/Rdynload.c
(https://github.com/wch/r-source/blob/tags/R-3-4-2/src/main/Rdynload.c#L173-L180)
there's the following comment/clarification:
/* Note that it is likely that dlopen will use up at least one file
descriptor for each DLL loaded (it may load further dynamically
linked libraries), so we do not want to get close to the fd limit
(which may be as low as 256). By default, the maximum number of DLLs
that can be loaded is 100. When the fd limit is known, we allow
increasing the maximum number of DLLs via environment variable up to
60% of the limit on open files, but to no more than 1000. g
*/
I always thought that "as low as 256" was for some archaic system,
but, as Wolfgang points out, it's a relevant limit. Since 0.6*256 =
153, this explains that the choice of the current default of a maximum
100 DLLs is reasonable and requests to bump it up much higher may not
be feasible (not cross-platform).
Related to this - "Garbage collection of DLLs":
I've implemented R.utils::gcDLLs() that "Identifies and removes
["stray"] DLLs of packages already unloaded". This function will free
up DLL slots otherwise occupied by unloaded packages. I've used is
successfully in many places, e.g. trying to load and unload all my
installed packages in a single R session (don't ask why ;)).
However, as argued by Karl Millar
(https://stat.ethz.ch/pipermail/r-devel/2016-December/073528.html),
there is a risk that unregistering such DLLs may render the state of R
unstable because we cannot know for sure whether there are some
registered finalizers that rely on such DLLs that yet haven't been
called. R.utils::gcDLLs() forces the garbage collector to run prior
to unregistering DLLs, which should eliminate the risk for this
problem. As far as I understand the current R implementation, this
should be enough. On the other hand, I've been wrong before, I don't
know about future version of R, and it has only been tested so much.
Guaranteeing reentrancy of finalizers is really tricky.
/Henrik
On Fri, Oct 6, 2017 at 10:16 AM, Wolfgang Huber <wolfgang.huber at embl.de> wrote:
> Interesting! In iTerm2, I get
> $ ulimit -Sn
> 4864
>
> and
> env R_MAX_NUM_DLLS=1000 R
>
> works, which means that on Mac it IS possible to have many more DLLs open
> than 100 if R is started in the right way.
>
> Wolfgang
>
> PS I meant OS X 10.12.6, too. SOrry for the typo.
>
>
> 6.10.17 14:50, Kasper Daniel Hansen scripsit:
>>
>> On OS X 10.12.6 (I don't think 10.12.16 exists), I get
>>
>> $ ulimit -Sn
>> 7168
>>
>> Interestingly, this is because I use iTerm2 for my command line prompt.
>> If I do the same command in Terminal I get 256. If I start R inside of
>> Emacs I get 256 as well. I don't know anything about ulimit and how it is
>> set, but that is a pretty start difference.
>>
>> Best,
>> Kasper
>>
>>
>>
>> On Fri, Oct 6, 2017 at 3:12 AM, Wolfgang Huber <wolfgang.huber at embl.de
>> <mailto:wolfgang.huber at embl.de>> wrote:
>>
>> On Mac OSX 10.12.16:
>> $ ulimit -Sn
>> 256
>>
>> so the maximum value of R_MAX_NUM_DLLS is 153 ...
>>
>> Wolfgang
>>
>> 5.10.17 23:02, Henrik Bengtsson scripsit:
>>
>> About the DLL limit:
>>
>> Just wanna make sure you're aware of "new" environment variable
>> R_MAX_NUM_DLLS available in R (>= 3.4.0). It allows you to push
>> the
>> current default limit of 100 open DLLs a bit higher. It can be
>> set in
>> .Renviron or before, e.g.
>>
>> $ R_MAX_NUM_DLLS=500 R
>>
>> This, of course, assumes that you can set it, which you might not
>> be
>> able to do on build servers. Also, there is an upper limit
>> min(0.6*fd_limit,1000) that depends on the number of files you can
>> have open at the same time (fd_limit), e.g. on my Ubuntu 16.04
>> I've
>> got:
>>
>> $ ulimit -Sn
>> 1024
>>
>> so R_MAX_NUM_DLLS=614 is the maximum for me.
>>
>> /Henrik
>>
>> On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber
>> <wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>> wrote:
>>
>>
>> Breaking up long workflows into several smaller "modules"
>> each with a
>> clearly defined input and output is a good idea, certainly
>> for didactic &
>> maintenance reasons.
>>
>> It doesn't "solve" the DLL issue though, it only avoids it
>> (for now)...
>>
>> I believe you can use a Makefile for your vignettes
>>
>> (https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes
>>
>> <https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes>),
>> and this might be a good way of managing which depends on
>> which. For passing
>> along output/input, perhaps local .RData files are good
>> enough, perhaps some
>> wheel-reinventing can also be avoided by using
>>
>> https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
>>
>> <https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html>
>> (haven't actually used it yet, though).
>>
>> Wolfgang
>>
>>
>>
>> 5.10.17 20:02, Aaron Lun scripsit:
>>
>>
>> This may relate to what I was thinking with respect to
>> solving the DLL
>> problem, by breaking up large workflows into modules
>> that can be executed in
>> separate R sessions. The same approach would also make
>> it easier to
>> associate package dependencies with specific parts of
>> the workflow.
>>
>>
>> In my particular situation, it is easy to break up the
>> workflow into
>> sections that can be executed completely independently.
>> However, I can also
>> imagine situations where dependencies on previous
>> objects, etc. make it
>> difficult to break up the workflow. If multiple files
>> are present in
>> vignettes/, can they be directed to execute in a
>> specific order, and would
>> output files from one vignette persist during the
>> execution of another?
>>
>>
>> -Aaron
>>
>>
>> ------------------------------------------------------------------------
>> *From:* Wolfgang Huber <wolfgang.huber at embl.de
>> <mailto:wolfgang.huber at embl.de>>
>> *Sent:* Thursday, 5 October 2017 6:23:47 PM
>> *To:* Laurent Gatto; Aaron Lun
>> *Cc:* bioc-devel at r-project.org
>> <mailto:bioc-devel at r-project.org>
>>
>> *Subject:* Re: [Bioc-devel] library() calls removed in
>> simpleSingleCell
>> workflow
>>
>>
>> I agree it is nice to be able to only load the packages
>> needed for a
>> certain section of a vignette and not the whole thing.
>> And that too many
>> `::` can make code look unwieldy (though some may
>> actually increase
>> readability).
>>
>> But relying on manually sprinkled in `library` calls
>> seems like a hack
>> prone to error. And there are always bound to be
>> dependencies that are
>> non-local, e.g. on general infrastructure like
>> SummarizedExperiment,
>> ggplot2, dplyr.
>>
>> So: do we need a way to computationally determine the
>> dependencies of a
>> vignette section, including highlighting/eliminating
>> potential name
>> clashes (b/c the warnings about masking emitted at
>> package loading are
>> easily ignored)? This seems like a straightforward
>> engineering task.
>>
>> Eventually with such code analysis we could get rid of
>> explicit
>> `library` calls altogether :)
>>
>> Wolfgang
>>
>>
>>
>>
>>
>> 5.10.17 08:53, Laurent Gatto scripsit:
>>
>>
>>
>> On 5 October 2017 00:11, Aaron Lun wrote:
>>
>> Here's another two cents from me:
>>
>> The explicit library() calls allow for easy
>> copy-pasting if people
>> only want to use/adapt a section of the
>> workflow. In such cases,
>> calling "library(simpleSingleCell)" could drag
>> in a lot of unnecessary
>> packages (e.g., which could hit the DLL limit).
>> Reading through the
>> text to figure out the requirements for each
>> code chunk seems like a
>> pain, and lots of "::" are unwieldy.
>>
>> More generally, the removal of individual
>> library() calls seems to
>> encourage the use of a single
>> "library(simpleSingleCell)" call at the
>> top of any user-developed custom analysis
>> scripts based on the
>> workflow. This seems conceptually odd to me -
>> the simpleSingleCell
>> package is simply a vehicle for the compiled
>> workflow, it shouldn't be
>> involved in analyses of other data.
>>
>>
>>
>> I can confirm that this is a possibility.
>>
>> Before workflows became available, I created the
>> RforProteomics package
>> that essentially provided one relatively large
>> vignette to demonstrate a
>> variety of applications of R/Bioconductor for mass
>> spectrometry and
>> proteomics. I think this has been a useful way to
>> disseminate R and
>> Bioconductor in these respective communities, but
>> also lead to the
>> confusion that it was that package that "did all the
>> stuff", i.e. people
>> saying that they were using RforProteomics to do a
>> task that was
>> described in the vignette. The RforProteomics
>> vignette does explicitly
>> call library at the beginning of each section and
>> explained that the
>> package was only a collection of analyses stemming
>> from other packages,
>> but that wasn't enough apparently.
>>
>> Laurent
>>
>>
>> -Aaron
>>
>> ________________________________
>> From: Bioc-devel
>> <bioc-devel-bounces at r-project.org
>> <mailto:bioc-devel-bounces at r-project.org>> on
>> behalf of
>> Wolfgang Huber <wolfgang.huber at embl.de
>> <mailto:wolfgang.huber at embl.de>>
>> Sent: Thursday, 5 October 2017 8:26 AM
>> To: bioc-devel at r-project.org
>> <mailto:bioc-devel at r-project.org>
>>
>> Subject: Re: [Bioc-devel] library() calls
>> removed in simpleSingleCell
>> workflow
>>
>>
>> I find `eval=FALSE` chunks not a good idea, since
>> - they confuse users who only see the rendered
>> HTML/PDF (where this flag
>> is not shown)
>> - they are not tested, so more prone to code rot.
>>
>> I'd also like to object to the idea that
>> proximity of a `library` call
>> to code that uses a package is somehow didactic.
>> It's actually a bad
>> habit: the R interpreter does not care. The
>> relevant package
>> - can be mentioned in the narrative,
>> - stated in the code with the pkgname:: prefix.
>> The latter is good didactics to get people used
>> to the idea of
>> namespaces, especially since there is an
>> increasing frequency of name
>> clashes in CRAN, tidyverse, BioC (e.g. consider
>> the various functions
>> named 'filter' and the obscure malbehaviors that
>> can result from these).
>>
>> Best wishes
>> Wolfgang
>>
>> On 04/10/2017 22:20, Turaga, Nitesh wrote:
>>
>>
>> Hi Aaron,
>>
>>
>> A work around solution maybe to, put all
>> libraries in a “eval=FALSE”
>> block in the r code chunk
>>
>> ```{r, eval=FALSE}
>> library(scran)
>> library(scater)
>> ```
>>
>> etc.
>>
>>
>> This way the users can see the library()
>> calls in the vignette.
>>
>> Best,
>>
>> Nitesh
>>
>> On Oct 4, 2017, at 4:14 PM, Obenchain,
>> Valerie
>> <Valerie.Obenchain at RoswellPark.org> wrote:
>>
>> Hi guys,
>>
>> A little background on this vignette ->
>> package conversion. The
>> workflows were converted to package form
>> because we want to integrate them
>> into the nightly build system instead of
>> supporting separate machines as
>> we're now doing.
>>
>> As part of this conversion, packages
>> loaded in workflow vignettes were
>> moved to Depends in DESCRIPTION. This
>> enables the user to load a single
>> package instead of many. Packages were
>> moved to Depends instead of Suggests
>> (as is usually done with software
>> packages) because these vignette is the
>> only thing these workflow
>>
>>
>> packages have going - no defined classes or methods.
>> This seemed a more
>> tidy approach and the dependencies are listed in Depends
>> for the user to
>> see. This was my (maybe bad?) idea and Nitesh was the
>> messenger. If you feel
>> the individual loading of packages in the vignette is a
>> key part of the
>> instruction/learning we can leave them as is and list
>> the packages in
>> Suggests.
>>
>>
>>
>> I should also mention that incorporating
>> the workflows into the build
>> system won't happen until after the
>> release. At that time we'll move the
>> repositories from svn to git and it's
>> likely we'll have to ask maintainers
>> to abide by some time/space guidelines.
>> At that point the build machines
>> will be building software,
>>
>>
>> experimental data and workflows and resources aren't
>> unlimited. When that
>> time comes we'll update the workflow guidelines and
>> contact maintainers.
>>
>>
>>
>> Thanks.
>> Valerie
>>
>>
>>
>> On 10/04/2017 12:27 PM, Kasper Daniel
>> Hansen wrote:
>>
>> yeah, that is super super useful to
>> people. In my vignettes (granted,
>> not
>> workflows) I have a separate
>> "Dependencies" section which is basically
>> a
>> series of library() calls.
>>
>> On Wed, Oct 4, 2017 at 3:18 PM, Aaron Lun
>> <alun at wehi.edu.au
>>
>> <mailto:alun at wehi.edu.au>><mailto:alun at wehi.edu.au
>> <mailto:alun at wehi.edu.au>> wrote:
>>
>>
>>
>> Dear Nitesh, list;
>>
>>
>> The library() calls in the
>> simpleSingleCell workflow have been
>> removed.
>> Why is this? I find explicit library()
>> calls to be quite useful for
>> readers
>> of the compiled vignette, because it
>> makes it easier for them to
>> determine
>> the packages that are required to adapt
>> parts of the workflow for
>> their own
>> analyses. If it doesn't hurt the build
>> system, I would prefer to have
>> these
>> library() calls in the vignette.
>>
>>
>> Cheers,
>>
>>
>> Aaron
>>
>> [[alternative HTML version
>> deleted]]
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org
>>
>> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
>> <mailto:Bioc-devel at r-project.org>>
>> mailing list
>>
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>>
>>
>> [[alternative HTML version
>> deleted]]
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org
>>
>> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
>> <mailto:Bioc-devel at r-project.org>>
>> mailing list
>>
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>>
>>
>>
>> This email message may contain legally
>> privileged and/or confidential
>> information. If you are not the
>> intended recipient(s), or the employee or
>> agent responsible for the delivery of
>> this message to the intended
>> recipient(s), you are hereby notified
>> that any disclosure, copying,
>> distribution, or use of this email
>> message is
>>
>>
>> prohibited. If you have received this message in error,
>> please notify the
>> sender immediately by e-mail and delete this email
>> message from your
>> computer. Thank you.
>>
>>
>> [[alternative HTML version
>> deleted]]
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org
>> <mailto:Bioc-devel at r-project.org>
>> mailing list
>>
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>> Bioc-devel Info Page - ETH
>>
>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>> stat.ethz.ch <http://stat.ethz.ch>
>> Your email address: Your name (optional): You
>> may enter a privacy
>> password below. This provides only mild
>> security, but should prevent others
>> from messing with ...
>>
>>
>>
>>
>>
>>
>> This email message may contain legally
>> privileged and/or confidential
>> information. If you are not the intended
>> recipient(s), or the employee or
>> agent responsible for the delivery of this
>> message to the intended
>> recipient(s), you are hereby notified that
>> any disclosure, copying,
>> distribution, or use of this email message is
>>
>>
>> prohibited. If you have received this message in error,
>> please notify the
>> sender immediately by e-mail and delete this email
>> message from your
>> computer. Thank you.
>>
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org
>> <mailto:Bioc-devel at r-project.org> mailing list
>>
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>> Bioc-devel Info Page - ETH
>>
>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>> stat.ethz.ch <http://stat.ethz.ch>
>> Your email address: Your name (optional): You
>> may enter a privacy
>> password below. This provides only mild
>> security, but should prevent others
>> from messing with ...
>>
>>
>>
>>
>>
>>
>>
>> --
>> With thanks in advance-
>> Wolfgang
>>
>> -------
>> Wolfgang Huber
>> Principal Investigator, EMBL Senior Scientist
>> European Molecular Biology Laboratory (EMBL)
>> Heidelberg, Germany
>>
>> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
>> http://www.huber.embl.de
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> With thanks in advance-
>> Wolfgang
>>
>> -------
>> Wolfgang Huber
>> Principal Investigator, EMBL Senior Scientist
>> European Molecular Biology Laboratory (EMBL)
>> Heidelberg, Germany
>>
>> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
>> http://www.huber.embl.de
>>
>> _______________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>> mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>> -- With thanks in advance-
>> Wolfgang
>>
>> -------
>> Wolfgang Huber
>> Principal Investigator, EMBL Senior Scientist
>> European Molecular Biology Laboratory (EMBL)
>> Heidelberg, Germany
>>
>> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
>> http://www.huber.embl.de
>>
>> _______________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
>> list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>
> --
> With thanks in advance-
> Wolfgang
>
> -------
> Wolfgang Huber
> Principal Investigator, EMBL Senior Scientist
> European Molecular Biology Laboratory (EMBL)
> Heidelberg, Germany
>
> wolfgang.huber at embl.de
> http://www.huber.embl.de
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list