[Bioc-devel] library() calls removed in simpleSingleCell workflow

Nan Xiao road2stat at gmail.com
Sat Oct 7 00:27:45 CEST 2017


Hi guys,

- this is a very interesting discussion, and thanks for sharing your ideas.

On a maybe relevant note, during my previous effort trying to create an
alternative workflow auto-build system based on Docker containerization
using my package liftr, I found some similar issues in declaring
dependencies, or, dependency management in general. I think both methods
(in the RMD or the workflow package DESCRIPTION) may have some potential
drawbacks and advantages -- some of them are available in the comments of
the following file:

https://github.com/road2stat/dockflow/blob/master/src/2-containerize.R

Hope this helps,
-Nan

On Fri, Oct 6, 2017 at 4:49 PM, Henrik Bengtsson <henrik.bengtsson at gmail.com
> wrote:

> I haven't tried (= had to do it) myself, so I don't know exactly what
> it takes, but you can configure this "ulimit" of number of open files,
> e.g. instructions in https://stackoverflow.com/a/34645/1072091.  I
> suspect it requires admin rights, but I'm not sure - maybe this is
> what goes on when you run it in different types terminals.
>
> About this open file/DLL limit: in src/main/Rdynload.c
> (https://github.com/wch/r-source/blob/tags/R-3-4-2/src/
> main/Rdynload.c#L173-L180)
> there's the following comment/clarification:
>
> /* Note that it is likely that dlopen will use up at least one file
> descriptor for each DLL loaded (it may load further dynamically
> linked libraries), so we do not want to get close to the fd limit
> (which may be as low as 256). By default, the maximum number of DLLs
> that can be loaded is 100. When the fd limit is known, we allow
> increasing the maximum number of DLLs via environment variable up to
> 60% of the limit on open files, but to no more than 1000. g
> */
>
> I always thought that "as low as 256" was for some archaic system,
> but, as Wolfgang points out, it's a relevant limit.  Since 0.6*256 =
> 153, this explains that the choice of the current default of a maximum
> 100 DLLs is reasonable and requests to bump it up much higher may not
> be feasible (not cross-platform).
>
>
> Related to this - "Garbage collection of DLLs":
>
> I've implemented R.utils::gcDLLs() that "Identifies and removes
> ["stray"] DLLs of packages already unloaded".  This function will free
> up DLL slots otherwise occupied by unloaded packages.  I've used is
> successfully in many places, e.g. trying to load and unload all my
> installed packages in a single R session (don't ask why ;)).
>
> However, as argued by Karl Millar
> (https://stat.ethz.ch/pipermail/r-devel/2016-December/073528.html),
> there is a risk that unregistering such DLLs may render the state of R
> unstable because we cannot know for sure whether there are some
> registered finalizers that rely on such DLLs that yet haven't been
> called.  R.utils::gcDLLs() forces the garbage collector to run prior
> to unregistering DLLs, which should eliminate the risk for this
> problem.  As far as I understand the current R implementation, this
> should be enough.  On the other hand, I've been wrong before, I don't
> know about future version of R, and it has only been tested so much.
> Guaranteeing reentrancy of finalizers is really tricky.
>
> /Henrik
>
> On Fri, Oct 6, 2017 at 10:16 AM, Wolfgang Huber <wolfgang.huber at embl.de>
> wrote:
> > Interesting! In iTerm2, I get
> > $ ulimit -Sn
> > 4864
> >
> > and
> > env R_MAX_NUM_DLLS=1000 R
> >
> > works, which means that on Mac it IS possible to have many more DLLs open
> > than 100 if R is started in the right way.
> >
> > Wolfgang
> >
> > PS I meant OS X 10.12.6, too. SOrry for the typo.
> >
> >
> > 6.10.17 14:50, Kasper Daniel Hansen scripsit:
> >>
> >> On OS X 10.12.6 (I don't think 10.12.16 exists), I get
> >>
> >> $ ulimit -Sn
> >> 7168
> >>
> >> Interestingly, this is because I use iTerm2 for my command line prompt.
> >> If I do the same command in Terminal I get 256.  If I start R inside of
> >> Emacs I get 256 as well.  I don't know anything about ulimit and how it
> is
> >> set, but that is a pretty start difference.
> >>
> >> Best,
> >> Kasper
> >>
> >>
> >>
> >> On Fri, Oct 6, 2017 at 3:12 AM, Wolfgang Huber <wolfgang.huber at embl.de
> >> <mailto:wolfgang.huber at embl.de>> wrote:
> >>
> >>     On Mac OSX 10.12.16:
> >>     $ ulimit -Sn
> >>     256
> >>
> >>     so the maximum value of R_MAX_NUM_DLLS is 153 ...
> >>
> >>              Wolfgang
> >>
> >>     5.10.17 23:02, Henrik Bengtsson scripsit:
> >>
> >>         About the DLL limit:
> >>
> >>         Just wanna make sure you're aware of "new" environment variable
> >>         R_MAX_NUM_DLLS available in R (>= 3.4.0).  It allows you to push
> >> the
> >>         current default limit of 100 open DLLs a bit higher.  It can be
> >>         set in
> >>         .Renviron or before, e.g.
> >>
> >>         $ R_MAX_NUM_DLLS=500 R
> >>
> >>         This, of course, assumes that you can set it, which you might
> not
> >> be
> >>         able to do on build servers.  Also, there is an upper limit
> >>         min(0.6*fd_limit,1000) that depends on the number of files you
> can
> >>         have open at the same time (fd_limit), e.g. on my Ubuntu 16.04
> >> I've
> >>         got:
> >>
> >>         $ ulimit -Sn
> >>         1024
> >>
> >>         so R_MAX_NUM_DLLS=614 is the maximum for me.
> >>
> >>         /Henrik
> >>
> >>         On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber
> >>         <wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>> wrote:
> >>
> >>
> >>             Breaking up long workflows into several smaller "modules"
> >>             each with a
> >>             clearly defined input and output is a good idea, certainly
> >>             for didactic &
> >>             maintenance reasons.
> >>
> >>             It doesn't "solve" the DLL issue though, it only avoids it
> >>             (for now)...
> >>
> >>             I believe you can use a Makefile for your vignettes
> >>
> >> (https://cran.r-project.org/doc/manuals/R-exts.html#
> Writing-package-vignettes
> >>
> >> <https://cran.r-project.org/doc/manuals/R-exts.html#
> Writing-package-vignettes>),
> >>             and this might be a good way of managing which depends on
> >>             which. For passing
> >>             along output/input, perhaps local .RData files are good
> >>             enough, perhaps some
> >>             wheel-reinventing can also be avoided by using
> >>
> >> https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
> >>
> >> <https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
> >
> >>             (haven't actually used it yet, though).
> >>
> >>                       Wolfgang
> >>
> >>
> >>
> >>             5.10.17 20:02, Aaron Lun scripsit:
> >>
> >>
> >>                 This may relate to what I was thinking with respect to
> >>                 solving the DLL
> >>                 problem, by breaking up large workflows into modules
> >>                 that can be executed in
> >>                 separate R sessions. The same approach would also make
> >>                 it easier to
> >>                 associate package dependencies with specific parts of
> >>                 the workflow.
> >>
> >>
> >>                 In my particular situation, it is easy to break up the
> >>                 workflow into
> >>                 sections that can be executed completely independently.
> >>                 However, I can also
> >>                 imagine situations where dependencies on previous
> >>                 objects, etc. make it
> >>                 difficult to break up the workflow. If multiple files
> >>                 are present in
> >>                 vignettes/, can they be directed to execute in a
> >>                 specific order, and would
> >>                 output files from one vignette persist during the
> >>                 execution of another?
> >>
> >>
> >>                 -Aaron
> >>
> >>
> >> ------------------------------------------------------------
> ------------
> >>                 *From:* Wolfgang Huber <wolfgang.huber at embl.de
> >>                 <mailto:wolfgang.huber at embl.de>>
> >>                 *Sent:* Thursday, 5 October 2017 6:23:47 PM
> >>                 *To:* Laurent Gatto; Aaron Lun
> >>                 *Cc:* bioc-devel at r-project.org
> >>                 <mailto:bioc-devel at r-project.org>
> >>
> >>                 *Subject:* Re: [Bioc-devel] library() calls removed in
> >>                 simpleSingleCell
> >>                 workflow
> >>
> >>
> >>                 I agree it is nice to be able to only load the packages
> >>                 needed for a
> >>                 certain section of a vignette and not the whole thing.
> >>                 And that too many
> >>                 `::` can make code look unwieldy (though some may
> >>                 actually increase
> >>                 readability).
> >>
> >>                 But relying on manually sprinkled in `library` calls
> >>                 seems like a hack
> >>                 prone to error. And there are always bound to be
> >>                 dependencies that are
> >>                 non-local, e.g. on general infrastructure like
> >>                 SummarizedExperiment,
> >>                 ggplot2, dplyr.
> >>
> >>                 So: do we need a way to computationally determine the
> >>                 dependencies of a
> >>                 vignette section, including highlighting/eliminating
> >>                 potential name
> >>                 clashes (b/c the warnings about masking emitted at
> >>                 package loading are
> >>                 easily ignored)? This seems like a straightforward
> >>                 engineering task.
> >>
> >>                 Eventually with such code analysis we could get rid of
> >>                 explicit
> >>                 `library` calls altogether :)
> >>
> >>                            Wolfgang
> >>
> >>
> >>
> >>
> >>
> >>                 5.10.17 08:53, Laurent Gatto scripsit:
> >>
> >>
> >>
> >>                     On  5 October 2017 00:11, Aaron Lun wrote:
> >>
> >>                         Here's another two cents from me:
> >>
> >>                         The explicit library() calls allow for easy
> >>                         copy-pasting if people
> >>                         only want to use/adapt a section of the
> >>                         workflow. In such cases,
> >>                         calling "library(simpleSingleCell)" could drag
> >>                         in a lot of unnecessary
> >>                         packages (e.g., which could hit the DLL limit).
> >>                         Reading through the
> >>                         text to figure out the requirements for each
> >>                         code chunk seems like a
> >>                         pain, and lots of "::" are unwieldy.
> >>
> >>                         More generally, the removal of individual
> >>                         library() calls seems to
> >>                         encourage the use of a single
> >>                         "library(simpleSingleCell)" call at the
> >>                         top of any user-developed custom analysis
> >>                         scripts based on the
> >>                         workflow. This seems conceptually odd to me -
> >>                         the simpleSingleCell
> >>                         package is simply a vehicle for the compiled
> >>                         workflow, it shouldn't be
> >>                         involved in analyses of other data.
> >>
> >>
> >>
> >>                     I can confirm that this is a possibility.
> >>
> >>                     Before workflows became available, I created the
> >>                     RforProteomics package
> >>                     that essentially provided one relatively large
> >>                     vignette to demonstrate a
> >>                     variety of applications of R/Bioconductor for mass
> >>                     spectrometry and
> >>                     proteomics. I think this has been a useful way to
> >>                     disseminate R and
> >>                     Bioconductor in these respective communities, but
> >>                     also lead to the
> >>                     confusion that it was that package that "did all the
> >>                     stuff", i.e. people
> >>                     saying that they were using RforProteomics to do a
> >>                     task that was
> >>                     described in the vignette. The RforProteomics
> >>                     vignette does explicitly
> >>                     call library at the beginning of each section and
> >>                     explained that the
> >>                     package was only a collection of analyses stemming
> >>                     from other packages,
> >>                     but that wasn't enough apparently.
> >>
> >>                     Laurent
> >>
> >>
> >>                         -Aaron
> >>
> >>                         ________________________________
> >>                         From: Bioc-devel
> >>                         <bioc-devel-bounces at r-project.org
> >>                         <mailto:bioc-devel-bounces at r-project.org>> on
> >>                         behalf of
> >>                         Wolfgang Huber <wolfgang.huber at embl.de
> >>                         <mailto:wolfgang.huber at embl.de>>
> >>                         Sent: Thursday, 5 October 2017 8:26 AM
> >>                         To: bioc-devel at r-project.org
> >>                         <mailto:bioc-devel at r-project.org>
> >>
> >>                         Subject: Re: [Bioc-devel] library() calls
> >>                         removed in simpleSingleCell
> >>                         workflow
> >>
> >>
> >>                         I find `eval=FALSE` chunks not a good idea,
> since
> >>                         - they confuse users who only see the rendered
> >>                         HTML/PDF (where this flag
> >>                         is not shown)
> >>                         - they are not tested, so more prone to code
> rot.
> >>
> >>                         I'd also like to object to the idea that
> >>                         proximity of a `library` call
> >>                         to code that uses a package is somehow didactic.
> >>                         It's actually a bad
> >>                         habit: the R interpreter does not care. The
> >>                         relevant package
> >>                         - can be mentioned in the narrative,
> >>                         - stated in the code with the pkgname:: prefix.
> >>                         The latter is good didactics to get people used
> >>                         to the idea of
> >>                         namespaces, especially since there is an
> >>                         increasing frequency of name
> >>                         clashes in CRAN, tidyverse, BioC (e.g. consider
> >>                         the various functions
> >>                         named 'filter' and the obscure malbehaviors that
> >>                         can result from these).
> >>
> >>                         Best wishes
> >>                                             Wolfgang
> >>
> >>                         On 04/10/2017 22:20, Turaga, Nitesh wrote:
> >>
> >>
> >>                             Hi Aaron,
> >>
> >>
> >>                             A work around solution maybe to, put all
> >>                             libraries in a “eval=FALSE”
> >>                             block in the r code chunk
> >>
> >>                             ```{r, eval=FALSE}
> >>                             library(scran)
> >>                             library(scater)
> >>                             ```
> >>
> >>                             etc.
> >>
> >>
> >>                             This way the users can see the library()
> >>                             calls in the vignette.
> >>
> >>                             Best,
> >>
> >>                             Nitesh
> >>
> >>                                 On Oct 4, 2017, at 4:14 PM, Obenchain,
> >>                                 Valerie
> >>                                 <Valerie.Obenchain at RoswellPark.org>
> wrote:
> >>
> >>                                 Hi guys,
> >>
> >>                                 A little background on this vignette ->
> >>                                 package conversion. The
> >>                                 workflows were converted to package form
> >>                                 because we want to integrate them
> >>                                 into the nightly build system instead of
> >>                                 supporting separate machines as
> >>                                 we're now doing.
> >>
> >>                                 As part of this conversion, packages
> >>                                 loaded in workflow vignettes were
> >>                                 moved to Depends in DESCRIPTION. This
> >>                                 enables the user to load a single
> >>                                 package instead of many. Packages were
> >>                                 moved to Depends instead of Suggests
> >>                                 (as is usually done with software
> >> packages) because these vignette is the
> >>                                 only thing these workflow
> >>
> >>
> >>                 packages have going - no defined classes or methods.
> >>                 This seemed a more
> >>                 tidy approach and the dependencies are listed in Depends
> >>                 for the user to
> >>                 see. This was my (maybe bad?) idea and Nitesh was the
> >>                 messenger. If you feel
> >>                 the individual loading of packages in the vignette is a
> >>                 key part of the
> >>                 instruction/learning we can leave them as is and list
> >>                 the packages in
> >>                 Suggests.
> >>
> >>
> >>
> >>                                 I should also mention that incorporating
> >>                                 the workflows into the build
> >>                                 system won't happen until after the
> >>                                 release. At that time we'll move the
> >>                                 repositories from svn to git and it's
> >>                                 likely we'll have to ask maintainers
> >>                                 to abide by some time/space guidelines.
> >> At that point the build machines
> >>                                 will be building software,
> >>
> >>
> >>                 experimental data and workflows and resources aren't
> >>                 unlimited. When that
> >>                 time comes we'll update the workflow guidelines and
> >>                 contact maintainers.
> >>
> >>
> >>
> >>                                 Thanks.
> >>                                 Valerie
> >>
> >>
> >>
> >>                                 On 10/04/2017 12:27 PM, Kasper Daniel
> >>                                 Hansen wrote:
> >>
> >>                                 yeah, that is super super useful to
> >>                                 people. In my vignettes (granted,
> >>                                 not
> >>                                 workflows) I have a separate
> >>                                 "Dependencies" section which is
> basically
> >>                                 a
> >>                                 series of library() calls.
> >>
> >>                                 On Wed, Oct 4, 2017 at 3:18 PM, Aaron
> Lun
> >>                                 <alun at wehi.edu.au
> >>
> >> <mailto:alun at wehi.edu.au>><mailto:alun at wehi.edu.au
> >>                                 <mailto:alun at wehi.edu.au>> wrote:
> >>
> >>
> >>
> >>                                 Dear Nitesh, list;
> >>
> >>
> >>                                 The library() calls in the
> >>                                 simpleSingleCell workflow have been
> >>                                 removed.
> >>                                 Why is this? I find explicit library()
> >>                                 calls to be quite useful for
> >>                                 readers
> >>                                 of the compiled vignette, because it
> >>                                 makes it easier for them to
> >>                                 determine
> >>                                 the packages that are required to adapt
> >>                                 parts of the workflow for
> >>                                 their own
> >>                                 analyses. If it doesn't hurt the build
> >>                                 system, I would prefer to have
> >>                                 these
> >>                                 library() calls in the vignette.
> >>
> >>
> >>                                 Cheers,
> >>
> >>
> >>                                 Aaron
> >>
> >>                                            [[alternative HTML version
> >>                                 deleted]]
> >>
> >>
> >> _______________________________________________
> >>                                 Bioc-devel at r-project.org
> >>
> >> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
> >>                                 <mailto:Bioc-devel at r-project.org>>
> >>                                 mailing list
> >>
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >>
> >>
> >>                                            [[alternative HTML version
> >>                                 deleted]]
> >>
> >>
> >> _______________________________________________
> >>                                 Bioc-devel at r-project.org
> >>
> >> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
> >>                                 <mailto:Bioc-devel at r-project.org>>
> >>                                 mailing list
> >>
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >>
> >>
> >>
> >>                                 This email message may contain legally
> >>                                 privileged and/or confidential
> >>                                 information.  If you are not the
> >>                                 intended recipient(s), or the employee
> or
> >>                                 agent responsible for the delivery of
> >>                                 this message to the intended
> >>                                 recipient(s), you are hereby notified
> >>                                 that  any disclosure, copying,
> >>                                 distribution, or use of this email
> >>                                 message is
> >>
> >>
> >>                 prohibited.  If you have received this message in error,
> >>                 please notify the
> >>                 sender immediately by e-mail and delete this email
> >>                 message from your
> >>                 computer. Thank you.
> >>
> >>
> >>                                           [[alternative HTML version
> >>                                 deleted]]
> >>
> >>
> >> _______________________________________________
> >>                                 Bioc-devel at r-project.org
> >>                                 <mailto:Bioc-devel at r-project.org>
> >>                                 mailing list
> >>
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >>                         Bioc-devel Info Page - ETH
> >>
> >> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
> >>                         stat.ethz.ch <http://stat.ethz.ch>
> >>                         Your email address: Your name (optional): You
> >>                         may enter a privacy
> >>                         password below. This provides only mild
> >>                         security, but should prevent others
> >>                         from messing with ...
> >>
> >>
> >>
> >>
> >>
> >>
> >>                             This email message may contain legally
> >>                             privileged and/or confidential
> >>                             information.  If you are not the intended
> >>                             recipient(s), or the employee or
> >>                             agent responsible for the delivery of this
> >>                             message to the intended
> >>                             recipient(s), you are hereby notified that
> >> any disclosure, copying,
> >>                             distribution, or use of this email message
> is
> >>
> >>
> >>                 prohibited.  If you have received this message in error,
> >>                 please notify the
> >>                 sender immediately by e-mail and delete this email
> >>                 message from your
> >>                 computer. Thank you.
> >>
> >>
> >>
> >> _______________________________________________
> >>                             Bioc-devel at r-project.org
> >>                             <mailto:Bioc-devel at r-project.org> mailing
> list
> >>
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >>                         Bioc-devel Info Page - ETH
> >>
> >> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
> >>                         stat.ethz.ch <http://stat.ethz.ch>
> >>                         Your email address: Your name (optional): You
> >>                         may enter a privacy
> >>                         password below. This provides only mild
> >>                         security, but should prevent others
> >>                         from messing with ...
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>                 --
> >>                 With thanks in advance-
> >>                 Wolfgang
> >>
> >>                 -------
> >>                 Wolfgang Huber
> >>                 Principal Investigator, EMBL Senior Scientist
> >>                 European Molecular Biology Laboratory (EMBL)
> >>                 Heidelberg, Germany
> >>
> >>                 wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
> >>                 http://www.huber.embl.de
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>             --
> >>             With thanks in advance-
> >>             Wolfgang
> >>
> >>             -------
> >>             Wolfgang Huber
> >>             Principal Investigator, EMBL Senior Scientist
> >>             European Molecular Biology Laboratory (EMBL)
> >>             Heidelberg, Germany
> >>
> >>             wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
> >>             http://www.huber.embl.de
> >>
> >>             _______________________________________________
> >>             Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> >>             mailing list
> >>             https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>             <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >>     --     With thanks in advance-
> >>     Wolfgang
> >>
> >>     -------
> >>     Wolfgang Huber
> >>     Principal Investigator, EMBL Senior Scientist
> >>     European Molecular Biology Laboratory (EMBL)
> >>     Heidelberg, Germany
> >>
> >>     wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
> >>     http://www.huber.embl.de
> >>
> >>     _______________________________________________
> >>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
> >> list
> >>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>     <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >
> > --
> > With thanks in advance-
> > Wolfgang
> >
> > -------
> > Wolfgang Huber
> > Principal Investigator, EMBL Senior Scientist
> > European Molecular Biology Laboratory (EMBL)
> > Heidelberg, Germany
> >
> > wolfgang.huber at embl.de
> > http://www.huber.embl.de
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
https://nanx.me

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list