[Bioc-devel] library() calls removed in simpleSingleCell workflow
Henrik Bengtsson
henrik.bengtsson at gmail.com
Thu Oct 5 23:02:57 CEST 2017
About the DLL limit:
Just wanna make sure you're aware of "new" environment variable
R_MAX_NUM_DLLS available in R (>= 3.4.0). It allows you to push the
current default limit of 100 open DLLs a bit higher. It can be set in
.Renviron or before, e.g.
$ R_MAX_NUM_DLLS=500 R
This, of course, assumes that you can set it, which you might not be
able to do on build servers. Also, there is an upper limit
min(0.6*fd_limit,1000) that depends on the number of files you can
have open at the same time (fd_limit), e.g. on my Ubuntu 16.04 I've
got:
$ ulimit -Sn
1024
so R_MAX_NUM_DLLS=614 is the maximum for me.
/Henrik
On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber <wolfgang.huber at embl.de> wrote:
>
> Breaking up long workflows into several smaller "modules" each with a
> clearly defined input and output is a good idea, certainly for didactic &
> maintenance reasons.
>
> It doesn't "solve" the DLL issue though, it only avoids it (for now)...
>
> I believe you can use a Makefile for your vignettes
> (https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes),
> and this might be a good way of managing which depends on which. For passing
> along output/input, perhaps local .RData files are good enough, perhaps some
> wheel-reinventing can also be avoided by using
> https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
> (haven't actually used it yet, though).
>
> Wolfgang
>
>
>
> 5.10.17 20:02, Aaron Lun scripsit:
>>
>> This may relate to what I was thinking with respect to solving the DLL
>> problem, by breaking up large workflows into modules that can be executed in
>> separate R sessions. The same approach would also make it easier to
>> associate package dependencies with specific parts of the workflow.
>>
>>
>> In my particular situation, it is easy to break up the workflow into
>> sections that can be executed completely independently. However, I can also
>> imagine situations where dependencies on previous objects, etc. make it
>> difficult to break up the workflow. If multiple files are present in
>> vignettes/, can they be directed to execute in a specific order, and would
>> output files from one vignette persist during the execution of another?
>>
>>
>> -Aaron
>>
>> ------------------------------------------------------------------------
>> *From:* Wolfgang Huber <wolfgang.huber at embl.de>
>> *Sent:* Thursday, 5 October 2017 6:23:47 PM
>> *To:* Laurent Gatto; Aaron Lun
>> *Cc:* bioc-devel at r-project.org
>> *Subject:* Re: [Bioc-devel] library() calls removed in simpleSingleCell
>> workflow
>>
>>
>> I agree it is nice to be able to only load the packages needed for a
>> certain section of a vignette and not the whole thing. And that too many
>> `::` can make code look unwieldy (though some may actually increase
>> readability).
>>
>> But relying on manually sprinkled in `library` calls seems like a hack
>> prone to error. And there are always bound to be dependencies that are
>> non-local, e.g. on general infrastructure like SummarizedExperiment,
>> ggplot2, dplyr.
>>
>> So: do we need a way to computationally determine the dependencies of a
>> vignette section, including highlighting/eliminating potential name
>> clashes (b/c the warnings about masking emitted at package loading are
>> easily ignored)? This seems like a straightforward engineering task.
>>
>> Eventually with such code analysis we could get rid of explicit
>> `library` calls altogether :)
>>
>> Wolfgang
>>
>>
>>
>>
>>
>> 5.10.17 08:53, Laurent Gatto scripsit:
>>>
>>>
>>> On 5 October 2017 00:11, Aaron Lun wrote:
>>>
>>>> Here's another two cents from me:
>>>>
>>>> The explicit library() calls allow for easy copy-pasting if people
>>>> only want to use/adapt a section of the workflow. In such cases,
>>>> calling "library(simpleSingleCell)" could drag in a lot of unnecessary
>>>> packages (e.g., which could hit the DLL limit). Reading through the
>>>> text to figure out the requirements for each code chunk seems like a
>>>> pain, and lots of "::" are unwieldy.
>>>>
>>>> More generally, the removal of individual library() calls seems to
>>>> encourage the use of a single "library(simpleSingleCell)" call at the
>>>> top of any user-developed custom analysis scripts based on the
>>>> workflow. This seems conceptually odd to me - the simpleSingleCell
>>>> package is simply a vehicle for the compiled workflow, it shouldn't be
>>>> involved in analyses of other data.
>>>
>>>
>>> I can confirm that this is a possibility.
>>>
>>> Before workflows became available, I created the RforProteomics package
>>> that essentially provided one relatively large vignette to demonstrate a
>>> variety of applications of R/Bioconductor for mass spectrometry and
>>> proteomics. I think this has been a useful way to disseminate R and
>>> Bioconductor in these respective communities, but also lead to the
>>> confusion that it was that package that "did all the stuff", i.e. people
>>> saying that they were using RforProteomics to do a task that was
>>> described in the vignette. The RforProteomics vignette does explicitly
>>> call library at the beginning of each section and explained that the
>>> package was only a collection of analyses stemming from other packages,
>>> but that wasn't enough apparently.
>>>
>>> Laurent
>>>
>>>
>>>> -Aaron
>>>>
>>>> ________________________________
>>>> From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of
>>>> Wolfgang Huber <wolfgang.huber at embl.de>
>>>> Sent: Thursday, 5 October 2017 8:26 AM
>>>> To: bioc-devel at r-project.org
>>>> Subject: Re: [Bioc-devel] library() calls removed in simpleSingleCell
>>>> workflow
>>>>
>>>>
>>>> I find `eval=FALSE` chunks not a good idea, since
>>>> - they confuse users who only see the rendered HTML/PDF (where this flag
>>>> is not shown)
>>>> - they are not tested, so more prone to code rot.
>>>>
>>>> I'd also like to object to the idea that proximity of a `library` call
>>>> to code that uses a package is somehow didactic. It's actually a bad
>>>> habit: the R interpreter does not care. The relevant package
>>>> - can be mentioned in the narrative,
>>>> - stated in the code with the pkgname:: prefix.
>>>> The latter is good didactics to get people used to the idea of
>>>> namespaces, especially since there is an increasing frequency of name
>>>> clashes in CRAN, tidyverse, BioC (e.g. consider the various functions
>>>> named 'filter' and the obscure malbehaviors that can result from these).
>>>>
>>>> Best wishes
>>>> Wolfgang
>>>>
>>>> On 04/10/2017 22:20, Turaga, Nitesh wrote:
>>>>>
>>>>> Hi Aaron,
>>>>>
>>>>>
>>>>> A work around solution maybe to, put all libraries in a “eval=FALSE”
>>>>> block in the r code chunk
>>>>>
>>>>> ```{r, eval=FALSE}
>>>>> library(scran)
>>>>> library(scater)
>>>>> ```
>>>>>
>>>>> etc.
>>>>>
>>>>>
>>>>> This way the users can see the library() calls in the vignette.
>>>>>
>>>>> Best,
>>>>>
>>>>> Nitesh
>>>>>
>>>>>> On Oct 4, 2017, at 4:14 PM, Obenchain, Valerie
>>>>>> <Valerie.Obenchain at RoswellPark.org> wrote:
>>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> A little background on this vignette -> package conversion. The
>>>>>> workflows were converted to package form because we want to integrate them
>>>>>> into the nightly build system instead of supporting separate machines as
>>>>>> we're now doing.
>>>>>>
>>>>>> As part of this conversion, packages loaded in workflow vignettes were
>>>>>> moved to Depends in DESCRIPTION. This enables the user to load a single
>>>>>> package instead of many. Packages were moved to Depends instead of Suggests
>>>>>> (as is usually done with software packages) because these vignette is the
>>>>>> only thing these workflow
>>
>> packages have going - no defined classes or methods. This seemed a more
>> tidy approach and the dependencies are listed in Depends for the user to
>> see. This was my (maybe bad?) idea and Nitesh was the messenger. If you feel
>> the individual loading of packages in the vignette is a key part of the
>> instruction/learning we can leave them as is and list the packages in
>> Suggests.
>>>>>>
>>>>>>
>>>>>> I should also mention that incorporating the workflows into the build
>>>>>> system won't happen until after the release. At that time we'll move the
>>>>>> repositories from svn to git and it's likely we'll have to ask maintainers
>>>>>> to abide by some time/space guidelines. At that point the build machines
>>>>>> will be building software,
>>
>> experimental data and workflows and resources aren't unlimited. When that
>> time comes we'll update the workflow guidelines and contact maintainers.
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>> Valerie
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/04/2017 12:27 PM, Kasper Daniel Hansen wrote:
>>>>>>
>>>>>> yeah, that is super super useful to people. In my vignettes (granted,
>>>>>> not
>>>>>> workflows) I have a separate "Dependencies" section which is basically
>>>>>> a
>>>>>> series of library() calls.
>>>>>>
>>>>>> On Wed, Oct 4, 2017 at 3:18 PM, Aaron Lun
>>>>>> <alun at wehi.edu.au><mailto:alun at wehi.edu.au> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dear Nitesh, list;
>>>>>>
>>>>>>
>>>>>> The library() calls in the simpleSingleCell workflow have been
>>>>>> removed.
>>>>>> Why is this? I find explicit library() calls to be quite useful for
>>>>>> readers
>>>>>> of the compiled vignette, because it makes it easier for them to
>>>>>> determine
>>>>>> the packages that are required to adapt parts of the workflow for
>>>>>> their own
>>>>>> analyses. If it doesn't hurt the build system, I would prefer to have
>>>>>> these
>>>>>> library() calls in the vignette.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> Aaron
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> This email message may contain legally privileged and/or confidential
>>>>>> information. If you are not the intended recipient(s), or the employee or
>>>>>> agent responsible for the delivery of this message to the intended
>>>>>> recipient(s), you are hereby notified that any disclosure, copying,
>>>>>> distribution, or use of this email message is
>>
>> prohibited. If you have received this message in error, please notify the
>> sender immediately by e-mail and delete this email message from your
>> computer. Thank you.
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>> Bioc-devel Info Page - ETH
>>>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>> stat.ethz.ch
>>>> Your email address: Your name (optional): You may enter a privacy
>>>> password below. This provides only mild security, but should prevent others
>>>> from messing with ...
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> This email message may contain legally privileged and/or confidential
>>>>> information. If you are not the intended recipient(s), or the employee or
>>>>> agent responsible for the delivery of this message to the intended
>>>>> recipient(s), you are hereby notified that any disclosure, copying,
>>>>> distribution, or use of this email message is
>>
>> prohibited. If you have received this message in error, please notify the
>> sender immediately by e-mail and delete this email message from your
>> computer. Thank you.
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>> Bioc-devel Info Page - ETH
>>>> Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>> stat.ethz.ch
>>>> Your email address: Your name (optional): You may enter a privacy
>>>> password below. This provides only mild security, but should prevent others
>>>> from messing with ...
>>>>
>>>>
>>>>
>>>>>
>>>
>>>
>>
>> --
>> With thanks in advance-
>> Wolfgang
>>
>> -------
>> Wolfgang Huber
>> Principal Investigator, EMBL Senior Scientist
>> European Molecular Biology Laboratory (EMBL)
>> Heidelberg, Germany
>>
>> wolfgang.huber at embl.de
>> http://www.huber.embl.de
>>
>>
>>
>>
>>
>>
>>
>
> --
> With thanks in advance-
> Wolfgang
>
> -------
> Wolfgang Huber
> Principal Investigator, EMBL Senior Scientist
> European Molecular Biology Laboratory (EMBL)
> Heidelberg, Germany
>
> wolfgang.huber at embl.de
> http://www.huber.embl.de
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list