[Bioc-devel] library() calls removed in simpleSingleCell workflow

Wolfgang Huber wolfgang.huber at embl.de
Fri Oct 6 09:12:08 CEST 2017


On Mac OSX 10.12.16:
$ ulimit -Sn
256

so the maximum value of R_MAX_NUM_DLLS is 153 ...

	Wolfgang

5.10.17 23:02, Henrik Bengtsson scripsit:
> About the DLL limit:
> 
> Just wanna make sure you're aware of "new" environment variable
> R_MAX_NUM_DLLS available in R (>= 3.4.0).  It allows you to push the
> current default limit of 100 open DLLs a bit higher.  It can be set in
> .Renviron or before, e.g.
> 
> $ R_MAX_NUM_DLLS=500 R
> 
> This, of course, assumes that you can set it, which you might not be
> able to do on build servers.  Also, there is an upper limit
> min(0.6*fd_limit,1000) that depends on the number of files you can
> have open at the same time (fd_limit), e.g. on my Ubuntu 16.04 I've
> got:
> 
> $ ulimit -Sn
> 1024
> 
> so R_MAX_NUM_DLLS=614 is the maximum for me.
> 
> /Henrik
> 
> On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber <wolfgang.huber at embl.de> wrote:
>>
>> Breaking up long workflows into several smaller "modules" each with a
>> clearly defined input and output is a good idea, certainly for didactic &
>> maintenance reasons.
>>
>> It doesn't "solve" the DLL issue though, it only avoids it (for now)...
>>
>> I believe you can use a Makefile for your vignettes
>> (https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes),
>> and this might be a good way of managing which depends on which. For passing
>> along output/input, perhaps local .RData files are good enough, perhaps some
>> wheel-reinventing can also be avoided by using
>> https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
>> (haven't actually used it yet, though).
>>
>>          Wolfgang
>>
>>
>>
>> 5.10.17 20:02, Aaron Lun scripsit:
>>>
>>> This may relate to what I was thinking with respect to solving the DLL
>>> problem, by breaking up large workflows into modules that can be executed in
>>> separate R sessions. The same approach would also make it easier to
>>> associate package dependencies with specific parts of the workflow.
>>>
>>>
>>> In my particular situation, it is easy to break up the workflow into
>>> sections that can be executed completely independently. However, I can also
>>> imagine situations where dependencies on previous objects, etc. make it
>>> difficult to break up the workflow. If multiple files are present in
>>> vignettes/, can they be directed to execute in a specific order, and would
>>> output files from one vignette persist during the execution of another?
>>>
>>>
>>> -Aaron
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Wolfgang Huber <wolfgang.huber at embl.de>
>>> *Sent:* Thursday, 5 October 2017 6:23:47 PM
>>> *To:* Laurent Gatto; Aaron Lun
>>> *Cc:* bioc-devel at r-project.org
>>> *Subject:* Re: [Bioc-devel] library() calls removed in simpleSingleCell
>>> workflow
>>>
>>>
>>> I agree it is nice to be able to only load the packages needed for a
>>> certain section of a vignette and not the whole thing. And that too many
>>> `::` can make code look unwieldy (though some may actually increase
>>> readability).
>>>
>>> But relying on manually sprinkled in `library` calls seems like a hack
>>> prone to error. And there are always bound to be dependencies that are
>>> non-local, e.g. on general infrastructure like SummarizedExperiment,
>>> ggplot2, dplyr.
>>>
>>> So: do we need a way to computationally determine the dependencies of a
>>> vignette section, including highlighting/eliminating potential name
>>> clashes (b/c the warnings about masking emitted at package loading are
>>> easily ignored)? This seems like a straightforward engineering task.
>>>
>>> Eventually with such code analysis we could get rid of explicit
>>> `library` calls altogether :)
>>>
>>>           Wolfgang
>>>
>>>
>>>
>>>
>>>
>>> 5.10.17 08:53, Laurent Gatto scripsit:
>>>>
>>>>
>>>> On  5 October 2017 00:11, Aaron Lun wrote:
>>>>
>>>>> Here's another two cents from me:
>>>>>
>>>>> The explicit library() calls allow for easy copy-pasting if people
>>>>> only want to use/adapt a section of the workflow. In such cases,
>>>>> calling "library(simpleSingleCell)" could drag in a lot of unnecessary
>>>>> packages (e.g., which could hit the DLL limit). Reading through the
>>>>> text to figure out the requirements for each code chunk seems like a
>>>>> pain, and lots of "::" are unwieldy.
>>>>>
>>>>> More generally, the removal of individual library() calls seems to
>>>>> encourage the use of a single "library(simpleSingleCell)" call at the
>>>>> top of any user-developed custom analysis scripts based on the
>>>>> workflow. This seems conceptually odd to me - the simpleSingleCell
>>>>> package is simply a vehicle for the compiled workflow, it shouldn't be
>>>>> involved in analyses of other data.
>>>>
>>>>
>>>> I can confirm that this is a possibility.
>>>>
>>>> Before workflows became available, I created the RforProteomics package
>>>> that essentially provided one relatively large vignette to demonstrate a
>>>> variety of applications of R/Bioconductor for mass spectrometry and
>>>> proteomics. I think this has been a useful way to disseminate R and
>>>> Bioconductor in these respective communities, but also lead to the
>>>> confusion that it was that package that "did all the stuff", i.e. people
>>>> saying that they were using RforProteomics to do a task that was
>>>> described in the vignette. The RforProteomics vignette does explicitly
>>>> call library at the beginning of each section and explained that the
>>>> package was only a collection of analyses stemming from other packages,
>>>> but that wasn't enough apparently.
>>>>
>>>> Laurent
>>>>
>>>>
>>>>> -Aaron
>>>>>
>>>>> ________________________________
>>>>> From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of
>>>>> Wolfgang Huber <wolfgang.huber at embl.de>
>>>>> Sent: Thursday, 5 October 2017 8:26 AM
>>>>> To: bioc-devel at r-project.org
>>>>> Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell
>>>>> workflow
>>>>>
>>>>>
>>>>> I find `eval=FALSE` chunks not a good idea, since
>>>>> - they confuse users who only see the rendered HTML/PDF (where this flag
>>>>> is not shown)
>>>>> - they are not tested, so more prone to code rot.
>>>>>
>>>>> I'd also like to object to the idea that proximity of a `library` call
>>>>> to code that uses a package is somehow didactic. It's actually a bad
>>>>> habit: the R interpreter does not care. The relevant package
>>>>> - can be mentioned in the narrative,
>>>>> - stated in the code with the pkgname:: prefix.
>>>>> The latter is good didactics to get people used to the idea of
>>>>> namespaces, especially since there is an increasing frequency of name
>>>>> clashes in CRAN, tidyverse, BioC (e.g. consider the various functions
>>>>> named 'filter' and the obscure malbehaviors that can result from these).
>>>>>
>>>>> Best wishes
>>>>>                    Wolfgang
>>>>>
>>>>> On 04/10/2017 22:20, Turaga, Nitesh wrote:
>>>>>>
>>>>>> Hi Aaron,
>>>>>>
>>>>>>
>>>>>> A work around solution maybe to, put all libraries in a “eval=FALSE”
>>>>>> block in the r code chunk
>>>>>>
>>>>>> ```{r, eval=FALSE}
>>>>>> library(scran)
>>>>>> library(scater)
>>>>>> ```
>>>>>>
>>>>>> etc.
>>>>>>
>>>>>>
>>>>>> This way the users can see the library() calls in the vignette.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Nitesh
>>>>>>
>>>>>>> On Oct 4, 2017, at 4:14 PM, Obenchain, Valerie
>>>>>>> <Valerie.Obenchain at RoswellPark.org> wrote:
>>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> A little background on this vignette -> package conversion. The
>>>>>>> workflows were converted to package form because we want to integrate them
>>>>>>> into the nightly build system instead of supporting separate machines as
>>>>>>> we're now doing.
>>>>>>>
>>>>>>> As part of this conversion, packages loaded in workflow vignettes were
>>>>>>> moved to Depends in DESCRIPTION. This enables the user to load a single
>>>>>>> package instead of many. Packages were moved to Depends instead of Suggests
>>>>>>> (as is usually done with software  packages) because these vignette is the
>>>>>>> only thing these workflow
>>>
>>> packages have going - no defined classes or methods. This seemed a more
>>> tidy approach and the dependencies are listed in Depends for the user to
>>> see. This was my (maybe bad?) idea and Nitesh was the messenger. If you feel
>>> the individual loading of packages in the vignette is a key part of the
>>> instruction/learning we can leave them as is and list the packages in
>>> Suggests.
>>>>>>>
>>>>>>>
>>>>>>> I should also mention that incorporating the workflows into the build
>>>>>>> system won't happen until after the release. At that time we'll move the
>>>>>>> repositories from svn to git and it's likely we'll have to ask maintainers
>>>>>>> to abide by some time/space guidelines.  At that point the build machines
>>>>>>> will be building software,
>>>
>>> experimental data and workflows and resources aren't unlimited. When that
>>> time comes we'll update the workflow guidelines and contact maintainers.
>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Valerie
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/04/2017 12:27 PM, Kasper Daniel Hansen wrote:
>>>>>>>
>>>>>>> yeah, that is super super useful to people. In my vignettes (granted,
>>>>>>> not
>>>>>>> workflows) I have a separate "Dependencies" section which is basically
>>>>>>> a
>>>>>>> series of library() calls.
>>>>>>>
>>>>>>> On Wed, Oct 4, 2017 at 3:18 PM, Aaron Lun
>>>>>>> <alun at wehi.edu.au><mailto:alun at wehi.edu.au> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dear Nitesh, list;
>>>>>>>
>>>>>>>
>>>>>>> The library() calls in the simpleSingleCell workflow have been
>>>>>>> removed.
>>>>>>> Why is this? I find explicit library() calls to be quite useful for
>>>>>>> readers
>>>>>>> of the compiled vignette, because it makes it easier for them to
>>>>>>> determine
>>>>>>> the packages that are required to adapt parts of the workflow for
>>>>>>> their own
>>>>>>> analyses. If it doesn't hurt the build system, I would prefer to have
>>>>>>> these
>>>>>>> library() calls in the vignette.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>> Aaron
>>>>>>>
>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This email message may contain legally privileged and/or confidential
>>>>>>> information.  If you are not the intended recipient(s), or the employee or
>>>>>>> agent responsible for the delivery of this message to the intended
>>>>>>> recipient(s), you are hereby notified that  any disclosure, copying,
>>>>>>> distribution, or use of this email message is
>>>
>>> prohibited.  If you have received this message in error, please notify the
>>> sender immediately by e-mail and delete this email message from your
>>> computer. Thank you.
>>>>>>>
>>>>>>>          [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>> Bioc-devel Info Page - ETH
>>>>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>>> stat.ethz.ch
>>>>> Your email address: Your name (optional): You may enter a privacy
>>>>> password below. This provides only mild security, but should prevent others
>>>>> from messing with ...
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> This email message may contain legally privileged and/or confidential
>>>>>> information.  If you are not the intended recipient(s), or the employee or
>>>>>> agent responsible for the delivery of this message to the intended
>>>>>> recipient(s), you are hereby notified that  any disclosure, copying,
>>>>>> distribution, or use of this email message is
>>>
>>> prohibited.  If you have received this message in error, please notify the
>>> sender immediately by e-mail and delete this email message from your
>>> computer. Thank you.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>> Bioc-devel Info Page - ETH
>>>>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>>> stat.ethz.ch
>>>>> Your email address: Your name (optional): You may enter a privacy
>>>>> password below. This provides only mild security, but should prevent others
>>>>> from messing with ...
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>
>>>>
>>>
>>> --
>>> With thanks in advance-
>>> Wolfgang
>>>
>>> -------
>>> Wolfgang Huber
>>> Principal Investigator, EMBL Senior Scientist
>>> European Molecular Biology Laboratory (EMBL)
>>> Heidelberg, Germany
>>>
>>> wolfgang.huber at embl.de
>>> http://www.huber.embl.de
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> With thanks in advance-
>> Wolfgang
>>
>> -------
>> Wolfgang Huber
>> Principal Investigator, EMBL Senior Scientist
>> European Molecular Biology Laboratory (EMBL)
>> Heidelberg, Germany
>>
>> wolfgang.huber at embl.de
>> http://www.huber.embl.de
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
With thanks in advance-
Wolfgang

-------
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.huber at embl.de
http://www.huber.embl.de



More information about the Bioc-devel mailing list