[Bioc-devel] library() calls removed in simpleSingleCell workflow
Nan Xiao
road2stat at gmail.com
Sat Oct 7 00:27:45 CEST 2017
Hi guys,
- this is a very interesting discussion, and thanks for sharing your ideas.
On a maybe relevant note, during my previous effort trying to create an
alternative workflow auto-build system based on Docker containerization
using my package liftr, I found some similar issues in declaring
dependencies, or, dependency management in general. I think both methods
(in the RMD or the workflow package DESCRIPTION) may have some potential
drawbacks and advantages -- some of them are available in the comments of
the following file:
https://github.com/road2stat/dockflow/blob/master/src/2-containerize.R
Hope this helps,
-Nan
On Fri, Oct 6, 2017 at 4:49 PM, Henrik Bengtsson <henrik.bengtsson at gmail.com
> wrote:
> I haven't tried (= had to do it) myself, so I don't know exactly what
> it takes, but you can configure this "ulimit" of number of open files,
> e.g. instructions in https://stackoverflow.com/a/34645/1072091. I
> suspect it requires admin rights, but I'm not sure - maybe this is
> what goes on when you run it in different types terminals.
>
> About this open file/DLL limit: in src/main/Rdynload.c
> (https://github.com/wch/r-source/blob/tags/R-3-4-2/src/
> main/Rdynload.c#L173-L180)
> there's the following comment/clarification:
>
> /* Note that it is likely that dlopen will use up at least one file
> descriptor for each DLL loaded (it may load further dynamically
> linked libraries), so we do not want to get close to the fd limit
> (which may be as low as 256). By default, the maximum number of DLLs
> that can be loaded is 100. When the fd limit is known, we allow
> increasing the maximum number of DLLs via environment variable up to
> 60% of the limit on open files, but to no more than 1000. g
> */
>
> I always thought that "as low as 256" was for some archaic system,
> but, as Wolfgang points out, it's a relevant limit. Since 0.6*256 =
> 153, this explains that the choice of the current default of a maximum
> 100 DLLs is reasonable and requests to bump it up much higher may not
> be feasible (not cross-platform).
>
>
> Related to this - "Garbage collection of DLLs":
>
> I've implemented R.utils::gcDLLs() that "Identifies and removes
> ["stray"] DLLs of packages already unloaded". This function will free
> up DLL slots otherwise occupied by unloaded packages. I've used is
> successfully in many places, e.g. trying to load and unload all my
> installed packages in a single R session (don't ask why ;)).
>
> However, as argued by Karl Millar
> (https://stat.ethz.ch/pipermail/r-devel/2016-December/073528.html),
> there is a risk that unregistering such DLLs may render the state of R
> unstable because we cannot know for sure whether there are some
> registered finalizers that rely on such DLLs that yet haven't been
> called. R.utils::gcDLLs() forces the garbage collector to run prior
> to unregistering DLLs, which should eliminate the risk for this
> problem. As far as I understand the current R implementation, this
> should be enough. On the other hand, I've been wrong before, I don't
> know about future version of R, and it has only been tested so much.
> Guaranteeing reentrancy of finalizers is really tricky.
>
> /Henrik
>
> On Fri, Oct 6, 2017 at 10:16 AM, Wolfgang Huber <wolfgang.huber at embl.de>
> wrote:
> > Interesting! In iTerm2, I get
> > $ ulimit -Sn
> > 4864
> >
> > and
> > env R_MAX_NUM_DLLS=1000 R
> >
> > works, which means that on Mac it IS possible to have many more DLLs open
> > than 100 if R is started in the right way.
> >
> > Wolfgang
> >
> > PS I meant OS X 10.12.6, too. SOrry for the typo.
> >
> >
> > 6.10.17 14:50, Kasper Daniel Hansen scripsit:
> >>
> >> On OS X 10.12.6 (I don't think 10.12.16 exists), I get
> >>
> >> $ ulimit -Sn
> >> 7168
> >>
> >> Interestingly, this is because I use iTerm2 for my command line prompt.
> >> If I do the same command in Terminal I get 256. If I start R inside of
> >> Emacs I get 256 as well. I don't know anything about ulimit and how it
> is
> >> set, but that is a pretty start difference.
> >>
> >> Best,
> >> Kasper
> >>
> >>
> >>
> >> On Fri, Oct 6, 2017 at 3:12 AM, Wolfgang Huber <wolfgang.huber at embl.de
> >> <mailto:wolfgang.huber at embl.de>> wrote:
> >>
> >> On Mac OSX 10.12.16:
> >> $ ulimit -Sn
> >> 256
> >>
> >> so the maximum value of R_MAX_NUM_DLLS is 153 ...
> >>
> >> Wolfgang
> >>
> >> 5.10.17 23:02, Henrik Bengtsson scripsit:
> >>
> >> About the DLL limit:
> >>
> >> Just wanna make sure you're aware of "new" environment variable
> >> R_MAX_NUM_DLLS available in R (>= 3.4.0). It allows you to push
> >> the
> >> current default limit of 100 open DLLs a bit higher. It can be
> >> set in
> >> .Renviron or before, e.g.
> >>
> >> $ R_MAX_NUM_DLLS=500 R
> >>
> >> This, of course, assumes that you can set it, which you might
> not
> >> be
> >> able to do on build servers. Also, there is an upper limit
> >> min(0.6*fd_limit,1000) that depends on the number of files you
> can
> >> have open at the same time (fd_limit), e.g. on my Ubuntu 16.04
> >> I've
> >> got:
> >>
> >> $ ulimit -Sn
> >> 1024
> >>
> >> so R_MAX_NUM_DLLS=614 is the maximum for me.
> >>
> >> /Henrik
> >>
> >> On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber
> >> <wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>> wrote:
> >>
> >>
> >> Breaking up long workflows into several smaller "modules"
> >> each with a
> >> clearly defined input and output is a good idea, certainly
> >> for didactic &
> >> maintenance reasons.
> >>
> >> It doesn't "solve" the DLL issue though, it only avoids it
> >> (for now)...
> >>
> >> I believe you can use a Makefile for your vignettes
> >>
> >> (https://cran.r-project.org/doc/manuals/R-exts.html#
> Writing-package-vignettes
> >>
> >> <https://cran.r-project.org/doc/manuals/R-exts.html#
> Writing-package-vignettes>),
> >> and this might be a good way of managing which depends on
> >> which. For passing
> >> along output/input, perhaps local .RData files are good
> >> enough, perhaps some
> >> wheel-reinventing can also be avoided by using
> >>
> >> https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
> >>
> >> <https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
> >
> >> (haven't actually used it yet, though).
> >>
> >> Wolfgang
> >>
> >>
> >>
> >> 5.10.17 20:02, Aaron Lun scripsit:
> >>
> >>
> >> This may relate to what I was thinking with respect to
> >> solving the DLL
> >> problem, by breaking up large workflows into modules
> >> that can be executed in
> >> separate R sessions. The same approach would also make
> >> it easier to
> >> associate package dependencies with specific parts of
> >> the workflow.
> >>
> >>
> >> In my particular situation, it is easy to break up the
> >> workflow into
> >> sections that can be executed completely independently.
> >> However, I can also
> >> imagine situations where dependencies on previous
> >> objects, etc. make it
> >> difficult to break up the workflow. If multiple files
> >> are present in
> >> vignettes/, can they be directed to execute in a
> >> specific order, and would
> >> output files from one vignette persist during the
> >> execution of another?
> >>
> >>
> >> -Aaron
> >>
> >>
> >> ------------------------------------------------------------
> ------------
> >> *From:* Wolfgang Huber <wolfgang.huber at embl.de
> >> <mailto:wolfgang.huber at embl.de>>
> >> *Sent:* Thursday, 5 October 2017 6:23:47 PM
> >> *To:* Laurent Gatto; Aaron Lun
> >> *Cc:* bioc-devel at r-project.org
> >> <mailto:bioc-devel at r-project.org>
> >>
> >> *Subject:* Re: [Bioc-devel] library() calls removed in
> >> simpleSingleCell
> >> workflow
> >>
> >>
> >> I agree it is nice to be able to only load the packages
> >> needed for a
> >> certain section of a vignette and not the whole thing.
> >> And that too many
> >> `::` can make code look unwieldy (though some may
> >> actually increase
> >> readability).
> >>
> >> But relying on manually sprinkled in `library` calls
> >> seems like a hack
> >> prone to error. And there are always bound to be
> >> dependencies that are
> >> non-local, e.g. on general infrastructure like
> >> SummarizedExperiment,
> >> ggplot2, dplyr.
> >>
> >> So: do we need a way to computationally determine the
> >> dependencies of a
> >> vignette section, including highlighting/eliminating
> >> potential name
> >> clashes (b/c the warnings about masking emitted at
> >> package loading are
> >> easily ignored)? This seems like a straightforward
> >> engineering task.
> >>
> >> Eventually with such code analysis we could get rid of
> >> explicit
> >> `library` calls altogether :)
> >>
> >> Wolfgang
> >>
> >>
> >>
> >>
> >>
> >> 5.10.17 08:53, Laurent Gatto scripsit:
> >>
> >>
> >>
> >> On 5 October 2017 00:11, Aaron Lun wrote:
> >>
> >> Here's another two cents from me:
> >>
> >> The explicit library() calls allow for easy
> >> copy-pasting if people
> >> only want to use/adapt a section of the
> >> workflow. In such cases,
> >> calling "library(simpleSingleCell)" could drag
> >> in a lot of unnecessary
> >> packages (e.g., which could hit the DLL limit).
> >> Reading through the
> >> text to figure out the requirements for each
> >> code chunk seems like a
> >> pain, and lots of "::" are unwieldy.
> >>
> >> More generally, the removal of individual
> >> library() calls seems to
> >> encourage the use of a single
> >> "library(simpleSingleCell)" call at the
> >> top of any user-developed custom analysis
> >> scripts based on the
> >> workflow. This seems conceptually odd to me -
> >> the simpleSingleCell
> >> package is simply a vehicle for the compiled
> >> workflow, it shouldn't be
> >> involved in analyses of other data.
> >>
> >>
> >>
> >> I can confirm that this is a possibility.
> >>
> >> Before workflows became available, I created the
> >> RforProteomics package
> >> that essentially provided one relatively large
> >> vignette to demonstrate a
> >> variety of applications of R/Bioconductor for mass
> >> spectrometry and
> >> proteomics. I think this has been a useful way to
> >> disseminate R and
> >> Bioconductor in these respective communities, but
> >> also lead to the
> >> confusion that it was that package that "did all the
> >> stuff", i.e. people
> >> saying that they were using RforProteomics to do a
> >> task that was
> >> described in the vignette. The RforProteomics
> >> vignette does explicitly
> >> call library at the beginning of each section and
> >> explained that the
> >> package was only a collection of analyses stemming
> >> from other packages,
> >> but that wasn't enough apparently.
> >>
> >> Laurent
> >>
> >>
> >> -Aaron
> >>
> >> ________________________________
> >> From: Bioc-devel
> >> <bioc-devel-bounces at r-project.org
> >> <mailto:bioc-devel-bounces at r-project.org>> on
> >> behalf of
> >> Wolfgang Huber <wolfgang.huber at embl.de
> >> <mailto:wolfgang.huber at embl.de>>
> >> Sent: Thursday, 5 October 2017 8:26 AM
> >> To: bioc-devel at r-project.org
> >> <mailto:bioc-devel at r-project.org>
> >>
> >> Subject: Re: [Bioc-devel] library() calls
> >> removed in simpleSingleCell
> >> workflow
> >>
> >>
> >> I find `eval=FALSE` chunks not a good idea,
> since
> >> - they confuse users who only see the rendered
> >> HTML/PDF (where this flag
> >> is not shown)
> >> - they are not tested, so more prone to code
> rot.
> >>
> >> I'd also like to object to the idea that
> >> proximity of a `library` call
> >> to code that uses a package is somehow didactic.
> >> It's actually a bad
> >> habit: the R interpreter does not care. The
> >> relevant package
> >> - can be mentioned in the narrative,
> >> - stated in the code with the pkgname:: prefix.
> >> The latter is good didactics to get people used
> >> to the idea of
> >> namespaces, especially since there is an
> >> increasing frequency of name
> >> clashes in CRAN, tidyverse, BioC (e.g. consider
> >> the various functions
> >> named 'filter' and the obscure malbehaviors that
> >> can result from these).
> >>
> >> Best wishes
> >> Wolfgang
> >>
> >> On 04/10/2017 22:20, Turaga, Nitesh wrote:
> >>
> >>
> >> Hi Aaron,
> >>
> >>
> >> A work around solution maybe to, put all
> >> libraries in a “eval=FALSE”
> >> block in the r code chunk
> >>
> >> ```{r, eval=FALSE}
> >> library(scran)
> >> library(scater)
> >> ```
> >>
> >> etc.
> >>
> >>
> >> This way the users can see the library()
> >> calls in the vignette.
> >>
> >> Best,
> >>
> >> Nitesh
> >>
> >> On Oct 4, 2017, at 4:14 PM, Obenchain,
> >> Valerie
> >> <Valerie.Obenchain at RoswellPark.org>
> wrote:
> >>
> >> Hi guys,
> >>
> >> A little background on this vignette ->
> >> package conversion. The
> >> workflows were converted to package form
> >> because we want to integrate them
> >> into the nightly build system instead of
> >> supporting separate machines as
> >> we're now doing.
> >>
> >> As part of this conversion, packages
> >> loaded in workflow vignettes were
> >> moved to Depends in DESCRIPTION. This
> >> enables the user to load a single
> >> package instead of many. Packages were
> >> moved to Depends instead of Suggests
> >> (as is usually done with software
> >> packages) because these vignette is the
> >> only thing these workflow
> >>
> >>
> >> packages have going - no defined classes or methods.
> >> This seemed a more
> >> tidy approach and the dependencies are listed in Depends
> >> for the user to
> >> see. This was my (maybe bad?) idea and Nitesh was the
> >> messenger. If you feel
> >> the individual loading of packages in the vignette is a
> >> key part of the
> >> instruction/learning we can leave them as is and list
> >> the packages in
> >> Suggests.
> >>
> >>
> >>
> >> I should also mention that incorporating
> >> the workflows into the build
> >> system won't happen until after the
> >> release. At that time we'll move the
> >> repositories from svn to git and it's
> >> likely we'll have to ask maintainers
> >> to abide by some time/space guidelines.
> >> At that point the build machines
> >> will be building software,
> >>
> >>
> >> experimental data and workflows and resources aren't
> >> unlimited. When that
> >> time comes we'll update the workflow guidelines and
> >> contact maintainers.
> >>
> >>
> >>
> >> Thanks.
> >> Valerie
> >>
> >>
> >>
> >> On 10/04/2017 12:27 PM, Kasper Daniel
> >> Hansen wrote:
> >>
> >> yeah, that is super super useful to
> >> people. In my vignettes (granted,
> >> not
> >> workflows) I have a separate
> >> "Dependencies" section which is
> basically
> >> a
> >> series of library() calls.
> >>
> >> On Wed, Oct 4, 2017 at 3:18 PM, Aaron
> Lun
> >> <alun at wehi.edu.au
> >>
> >> <mailto:alun at wehi.edu.au>><mailto:alun at wehi.edu.au
> >> <mailto:alun at wehi.edu.au>> wrote:
> >>
> >>
> >>
> >> Dear Nitesh, list;
> >>
> >>
> >> The library() calls in the
> >> simpleSingleCell workflow have been
> >> removed.
> >> Why is this? I find explicit library()
> >> calls to be quite useful for
> >> readers
> >> of the compiled vignette, because it
> >> makes it easier for them to
> >> determine
> >> the packages that are required to adapt
> >> parts of the workflow for
> >> their own
> >> analyses. If it doesn't hurt the build
> >> system, I would prefer to have
> >> these
> >> library() calls in the vignette.
> >>
> >>
> >> Cheers,
> >>
> >>
> >> Aaron
> >>
> >> [[alternative HTML version
> >> deleted]]
> >>
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org
> >>
> >> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
> >> <mailto:Bioc-devel at r-project.org>>
> >> mailing list
> >>
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >>
> >>
> >> [[alternative HTML version
> >> deleted]]
> >>
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org
> >>
> >> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
> >> <mailto:Bioc-devel at r-project.org>>
> >> mailing list
> >>
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >>
> >>
> >>
> >> This email message may contain legally
> >> privileged and/or confidential
> >> information. If you are not the
> >> intended recipient(s), or the employee
> or
> >> agent responsible for the delivery of
> >> this message to the intended
> >> recipient(s), you are hereby notified
> >> that any disclosure, copying,
> >> distribution, or use of this email
> >> message is
> >>
> >>
> >> prohibited. If you have received this message in error,
> >> please notify the
> >> sender immediately by e-mail and delete this email
> >> message from your
> >> computer. Thank you.
> >>
> >>
> >> [[alternative HTML version
> >> deleted]]
> >>
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org
> >> <mailto:Bioc-devel at r-project.org>
> >> mailing list
> >>
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >> Bioc-devel Info Page - ETH
> >>
> >> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
> >> stat.ethz.ch <http://stat.ethz.ch>
> >> Your email address: Your name (optional): You
> >> may enter a privacy
> >> password below. This provides only mild
> >> security, but should prevent others
> >> from messing with ...
> >>
> >>
> >>
> >>
> >>
> >>
> >> This email message may contain legally
> >> privileged and/or confidential
> >> information. If you are not the intended
> >> recipient(s), or the employee or
> >> agent responsible for the delivery of this
> >> message to the intended
> >> recipient(s), you are hereby notified that
> >> any disclosure, copying,
> >> distribution, or use of this email message
> is
> >>
> >>
> >> prohibited. If you have received this message in error,
> >> please notify the
> >> sender immediately by e-mail and delete this email
> >> message from your
> >> computer. Thank you.
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org
> >> <mailto:Bioc-devel at r-project.org> mailing
> list
> >>
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >> Bioc-devel Info Page - ETH
> >>
> >> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
> >> stat.ethz.ch <http://stat.ethz.ch>
> >> Your email address: Your name (optional): You
> >> may enter a privacy
> >> password below. This provides only mild
> >> security, but should prevent others
> >> from messing with ...
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> With thanks in advance-
> >> Wolfgang
> >>
> >> -------
> >> Wolfgang Huber
> >> Principal Investigator, EMBL Senior Scientist
> >> European Molecular Biology Laboratory (EMBL)
> >> Heidelberg, Germany
> >>
> >> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
> >> http://www.huber.embl.de
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> With thanks in advance-
> >> Wolfgang
> >>
> >> -------
> >> Wolfgang Huber
> >> Principal Investigator, EMBL Senior Scientist
> >> European Molecular Biology Laboratory (EMBL)
> >> Heidelberg, Germany
> >>
> >> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
> >> http://www.huber.embl.de
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> >> mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >> -- With thanks in advance-
> >> Wolfgang
> >>
> >> -------
> >> Wolfgang Huber
> >> Principal Investigator, EMBL Senior Scientist
> >> European Molecular Biology Laboratory (EMBL)
> >> Heidelberg, Germany
> >>
> >> wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
> >> http://www.huber.embl.de
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
> >> list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >
> > --
> > With thanks in advance-
> > Wolfgang
> >
> > -------
> > Wolfgang Huber
> > Principal Investigator, EMBL Senior Scientist
> > European Molecular Biology Laboratory (EMBL)
> > Heidelberg, Germany
> >
> > wolfgang.huber at embl.de
> > http://www.huber.embl.de
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
https://nanx.me
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list