[Bioc-devel] library() calls removed in simpleSingleCell workflow

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Fri Oct 6 14:50:15 CEST 2017


On OS X 10.12.6 (I don't think 10.12.16 exists), I get

$ ulimit -Sn
7168

Interestingly, this is because I use iTerm2 for my command line prompt.  If
I do the same command in Terminal I get 256.  If I start R inside of Emacs
I get 256 as well.  I don't know anything about ulimit and how it is set,
but that is a pretty start difference.

Best,
Kasper



On Fri, Oct 6, 2017 at 3:12 AM, Wolfgang Huber <wolfgang.huber at embl.de>
wrote:

> On Mac OSX 10.12.16:
> $ ulimit -Sn
> 256
>
> so the maximum value of R_MAX_NUM_DLLS is 153 ...
>
>         Wolfgang
>
> 5.10.17 23:02, Henrik Bengtsson scripsit:
>
> About the DLL limit:
>>
>> Just wanna make sure you're aware of "new" environment variable
>> R_MAX_NUM_DLLS available in R (>= 3.4.0).  It allows you to push the
>> current default limit of 100 open DLLs a bit higher.  It can be set in
>> .Renviron or before, e.g.
>>
>> $ R_MAX_NUM_DLLS=500 R
>>
>> This, of course, assumes that you can set it, which you might not be
>> able to do on build servers.  Also, there is an upper limit
>> min(0.6*fd_limit,1000) that depends on the number of files you can
>> have open at the same time (fd_limit), e.g. on my Ubuntu 16.04 I've
>> got:
>>
>> $ ulimit -Sn
>> 1024
>>
>> so R_MAX_NUM_DLLS=614 is the maximum for me.
>>
>> /Henrik
>>
>> On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber <wolfgang.huber at embl.de>
>> wrote:
>>
>>>
>>> Breaking up long workflows into several smaller "modules" each with a
>>> clearly defined input and output is a good idea, certainly for didactic &
>>> maintenance reasons.
>>>
>>> It doesn't "solve" the DLL issue though, it only avoids it (for now)...
>>>
>>> I believe you can use a Makefile for your vignettes
>>> (https://cran.r-project.org/doc/manuals/R-exts.html#Writing-
>>> package-vignettes),
>>> and this might be a good way of managing which depends on which. For
>>> passing
>>> along output/input, perhaps local .RData files are good enough, perhaps
>>> some
>>> wheel-reinventing can also be avoided by using
>>> https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
>>> (haven't actually used it yet, though).
>>>
>>>          Wolfgang
>>>
>>>
>>>
>>> 5.10.17 20:02, Aaron Lun scripsit:
>>>
>>>>
>>>> This may relate to what I was thinking with respect to solving the DLL
>>>> problem, by breaking up large workflows into modules that can be
>>>> executed in
>>>> separate R sessions. The same approach would also make it easier to
>>>> associate package dependencies with specific parts of the workflow.
>>>>
>>>>
>>>> In my particular situation, it is easy to break up the workflow into
>>>> sections that can be executed completely independently. However, I can
>>>> also
>>>> imagine situations where dependencies on previous objects, etc. make it
>>>> difficult to break up the workflow. If multiple files are present in
>>>> vignettes/, can they be directed to execute in a specific order, and
>>>> would
>>>> output files from one vignette persist during the execution of another?
>>>>
>>>>
>>>> -Aaron
>>>>
>>>> ------------------------------------------------------------
>>>> ------------
>>>> *From:* Wolfgang Huber <wolfgang.huber at embl.de>
>>>> *Sent:* Thursday, 5 October 2017 6:23:47 PM
>>>> *To:* Laurent Gatto; Aaron Lun
>>>> *Cc:* bioc-devel at r-project.org
>>>> *Subject:* Re: [Bioc-devel] library() calls removed in simpleSingleCell
>>>> workflow
>>>>
>>>>
>>>> I agree it is nice to be able to only load the packages needed for a
>>>> certain section of a vignette and not the whole thing. And that too many
>>>> `::` can make code look unwieldy (though some may actually increase
>>>> readability).
>>>>
>>>> But relying on manually sprinkled in `library` calls seems like a hack
>>>> prone to error. And there are always bound to be dependencies that are
>>>> non-local, e.g. on general infrastructure like SummarizedExperiment,
>>>> ggplot2, dplyr.
>>>>
>>>> So: do we need a way to computationally determine the dependencies of a
>>>> vignette section, including highlighting/eliminating potential name
>>>> clashes (b/c the warnings about masking emitted at package loading are
>>>> easily ignored)? This seems like a straightforward engineering task.
>>>>
>>>> Eventually with such code analysis we could get rid of explicit
>>>> `library` calls altogether :)
>>>>
>>>>           Wolfgang
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 5.10.17 08:53, Laurent Gatto scripsit:
>>>>
>>>>>
>>>>>
>>>>> On  5 October 2017 00:11, Aaron Lun wrote:
>>>>>
>>>>> Here's another two cents from me:
>>>>>>
>>>>>> The explicit library() calls allow for easy copy-pasting if people
>>>>>> only want to use/adapt a section of the workflow. In such cases,
>>>>>> calling "library(simpleSingleCell)" could drag in a lot of unnecessary
>>>>>> packages (e.g., which could hit the DLL limit). Reading through the
>>>>>> text to figure out the requirements for each code chunk seems like a
>>>>>> pain, and lots of "::" are unwieldy.
>>>>>>
>>>>>> More generally, the removal of individual library() calls seems to
>>>>>> encourage the use of a single "library(simpleSingleCell)" call at the
>>>>>> top of any user-developed custom analysis scripts based on the
>>>>>> workflow. This seems conceptually odd to me - the simpleSingleCell
>>>>>> package is simply a vehicle for the compiled workflow, it shouldn't be
>>>>>> involved in analyses of other data.
>>>>>>
>>>>>
>>>>>
>>>>> I can confirm that this is a possibility.
>>>>>
>>>>> Before workflows became available, I created the RforProteomics package
>>>>> that essentially provided one relatively large vignette to demonstrate
>>>>> a
>>>>> variety of applications of R/Bioconductor for mass spectrometry and
>>>>> proteomics. I think this has been a useful way to disseminate R and
>>>>> Bioconductor in these respective communities, but also lead to the
>>>>> confusion that it was that package that "did all the stuff", i.e.
>>>>> people
>>>>> saying that they were using RforProteomics to do a task that was
>>>>> described in the vignette. The RforProteomics vignette does explicitly
>>>>> call library at the beginning of each section and explained that the
>>>>> package was only a collection of analyses stemming from other packages,
>>>>> but that wasn't enough apparently.
>>>>>
>>>>> Laurent
>>>>>
>>>>>
>>>>> -Aaron
>>>>>>
>>>>>> ________________________________
>>>>>> From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of
>>>>>> Wolfgang Huber <wolfgang.huber at embl.de>
>>>>>> Sent: Thursday, 5 October 2017 8:26 AM
>>>>>> To: bioc-devel at r-project.org
>>>>>> Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell
>>>>>> workflow
>>>>>>
>>>>>>
>>>>>> I find `eval=FALSE` chunks not a good idea, since
>>>>>> - they confuse users who only see the rendered HTML/PDF (where this
>>>>>> flag
>>>>>> is not shown)
>>>>>> - they are not tested, so more prone to code rot.
>>>>>>
>>>>>> I'd also like to object to the idea that proximity of a `library` call
>>>>>> to code that uses a package is somehow didactic. It's actually a bad
>>>>>> habit: the R interpreter does not care. The relevant package
>>>>>> - can be mentioned in the narrative,
>>>>>> - stated in the code with the pkgname:: prefix.
>>>>>> The latter is good didactics to get people used to the idea of
>>>>>> namespaces, especially since there is an increasing frequency of name
>>>>>> clashes in CRAN, tidyverse, BioC (e.g. consider the various functions
>>>>>> named 'filter' and the obscure malbehaviors that can result from
>>>>>> these).
>>>>>>
>>>>>> Best wishes
>>>>>>                    Wolfgang
>>>>>>
>>>>>> On 04/10/2017 22:20, Turaga, Nitesh wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Aaron,
>>>>>>>
>>>>>>>
>>>>>>> A work around solution maybe to, put all libraries in a “eval=FALSE”
>>>>>>> block in the r code chunk
>>>>>>>
>>>>>>> ```{r, eval=FALSE}
>>>>>>> library(scran)
>>>>>>> library(scater)
>>>>>>> ```
>>>>>>>
>>>>>>> etc.
>>>>>>>
>>>>>>>
>>>>>>> This way the users can see the library() calls in the vignette.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Nitesh
>>>>>>>
>>>>>>> On Oct 4, 2017, at 4:14 PM, Obenchain, Valerie
>>>>>>>> <Valerie.Obenchain at RoswellPark.org> wrote:
>>>>>>>>
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> A little background on this vignette -> package conversion. The
>>>>>>>> workflows were converted to package form because we want to
>>>>>>>> integrate them
>>>>>>>> into the nightly build system instead of supporting separate
>>>>>>>> machines as
>>>>>>>> we're now doing.
>>>>>>>>
>>>>>>>> As part of this conversion, packages loaded in workflow vignettes
>>>>>>>> were
>>>>>>>> moved to Depends in DESCRIPTION. This enables the user to load a
>>>>>>>> single
>>>>>>>> package instead of many. Packages were moved to Depends instead of
>>>>>>>> Suggests
>>>>>>>> (as is usually done with software  packages) because these vignette
>>>>>>>> is the
>>>>>>>> only thing these workflow
>>>>>>>>
>>>>>>>
>>>> packages have going - no defined classes or methods. This seemed a more
>>>> tidy approach and the dependencies are listed in Depends for the user to
>>>> see. This was my (maybe bad?) idea and Nitesh was the messenger. If you
>>>> feel
>>>> the individual loading of packages in the vignette is a key part of the
>>>> instruction/learning we can leave them as is and list the packages in
>>>> Suggests.
>>>>
>>>>>
>>>>>>>>
>>>>>>>> I should also mention that incorporating the workflows into the
>>>>>>>> build
>>>>>>>> system won't happen until after the release. At that time we'll
>>>>>>>> move the
>>>>>>>> repositories from svn to git and it's likely we'll have to ask
>>>>>>>> maintainers
>>>>>>>> to abide by some time/space guidelines.  At that point the build
>>>>>>>> machines
>>>>>>>> will be building software,
>>>>>>>>
>>>>>>>
>>>> experimental data and workflows and resources aren't unlimited. When
>>>> that
>>>> time comes we'll update the workflow guidelines and contact maintainers.
>>>>
>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>> Valerie
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/04/2017 12:27 PM, Kasper Daniel Hansen wrote:
>>>>>>>>
>>>>>>>> yeah, that is super super useful to people. In my vignettes
>>>>>>>> (granted,
>>>>>>>> not
>>>>>>>> workflows) I have a separate "Dependencies" section which is
>>>>>>>> basically
>>>>>>>> a
>>>>>>>> series of library() calls.
>>>>>>>>
>>>>>>>> On Wed, Oct 4, 2017 at 3:18 PM, Aaron Lun
>>>>>>>> <alun at wehi.edu.au><mailto:alun at wehi.edu.au> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Dear Nitesh, list;
>>>>>>>>
>>>>>>>>
>>>>>>>> The library() calls in the simpleSingleCell workflow have been
>>>>>>>> removed.
>>>>>>>> Why is this? I find explicit library() calls to be quite useful for
>>>>>>>> readers
>>>>>>>> of the compiled vignette, because it makes it easier for them to
>>>>>>>> determine
>>>>>>>> the packages that are required to adapt parts of the workflow for
>>>>>>>> their own
>>>>>>>> analyses. If it doesn't hurt the build system, I would prefer to
>>>>>>>> have
>>>>>>>> these
>>>>>>>> library() calls in the vignette.
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>
>>>>>>>> Aaron
>>>>>>>>
>>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing
>>>>>>>> list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing
>>>>>>>> list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This email message may contain legally privileged and/or
>>>>>>>> confidential
>>>>>>>> information.  If you are not the intended recipient(s), or the
>>>>>>>> employee or
>>>>>>>> agent responsible for the delivery of this message to the intended
>>>>>>>> recipient(s), you are hereby notified that  any disclosure, copying,
>>>>>>>> distribution, or use of this email message is
>>>>>>>>
>>>>>>>
>>>> prohibited.  If you have received this message in error, please notify
>>>> the
>>>> sender immediately by e-mail and delete this email message from your
>>>> computer. Thank you.
>>>>
>>>>>
>>>>>>>>          [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>>
>>>>>>>
>>>>>> Bioc-devel Info Page - ETH
>>>>>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>>>> stat.ethz.ch
>>>>>> Your email address: Your name (optional): You may enter a privacy
>>>>>> password below. This provides only mild security, but should prevent
>>>>>> others
>>>>>> from messing with ...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This email message may contain legally privileged and/or confidential
>>>>>>> information.  If you are not the intended recipient(s), or the
>>>>>>> employee or
>>>>>>> agent responsible for the delivery of this message to the intended
>>>>>>> recipient(s), you are hereby notified that  any disclosure, copying,
>>>>>>> distribution, or use of this email message is
>>>>>>>
>>>>>>
>>>> prohibited.  If you have received this message in error, please notify
>>>> the
>>>> sender immediately by e-mail and delete this email message from your
>>>> computer. Thank you.
>>>>
>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>
>>>>>>
>>>>>> Bioc-devel Info Page - ETH
>>>>>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>>>> stat.ethz.ch
>>>>>> Your email address: Your name (optional): You may enter a privacy
>>>>>> password below. This provides only mild security, but should prevent
>>>>>> others
>>>>>> from messing with ...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>> --
>>>> With thanks in advance-
>>>> Wolfgang
>>>>
>>>> -------
>>>> Wolfgang Huber
>>>> Principal Investigator, EMBL Senior Scientist
>>>> European Molecular Biology Laboratory (EMBL)
>>>> Heidelberg, Germany
>>>>
>>>> wolfgang.huber at embl.de
>>>> http://www.huber.embl.de
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>> --
>>> With thanks in advance-
>>> Wolfgang
>>>
>>> -------
>>> Wolfgang Huber
>>> Principal Investigator, EMBL Senior Scientist
>>> European Molecular Biology Laboratory (EMBL)
>>> Heidelberg, Germany
>>>
>>> wolfgang.huber at embl.de
>>> http://www.huber.embl.de
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
> --
> With thanks in advance-
> Wolfgang
>
> -------
> Wolfgang Huber
> Principal Investigator, EMBL Senior Scientist
> European Molecular Biology Laboratory (EMBL)
> Heidelberg, Germany
>
> wolfgang.huber at embl.de
> http://www.huber.embl.de
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list