[Bioc-devel] Controlling vignette compilation order

Tue Dec 18 17:51:50 CET 2018

Hi Aaron,

Right now 'R CMD build' evaluates all vignettes in the same R session. 
Personally I see this as an undesirable feature and hope that it will 
change in the future. Problem with this is that when a vignette hits the 
max DLL limit, breaking it down into smaller vignettes doesn't help. 
Another problem is that sometimes using 'R CMD Stangle && source()' does 
not reproduce a bug triggered by 'R CMD build'. I can spend a lot of 
time scratching my head on this until I finally realize that I first 
have to evaluate one of the other vignettes in order to reproduce the bug.

On this note I wish 'R CMD build' would show progress by printing the 
name of the vignettes it's currently evaluating (like 'R CMD check' does 
during the 'checking running R code from vignettes' step). Should be an 
easy improvement and it would already help a lot.

That being said I'm also sympathetic to your use case where sometimes a 
big monolithic vignette needs to be broken down into smaller units. I 
don't know of any way to control the order of evaluation other than 
using a Makefile for that though.

H.

On 12/18/18 06:58, Aaron Lun wrote:
> @Michael In this case, the resource produced by vignette X is a SingleCellExperiment object containing the results of various processing steps (normalization, clustering, etc.) described in that vignette.
>
> I can imagine a lazy evaluation model for this, but it wouldn’t be pretty. If I had another vignette Y that depended on the SCE produced by vignette X, I would need Y to execute all of the steps in X if X hadn’t already been run before Y. This gets us into the territory of Makefile-like dependencies, which seems even more complicated than simply specifying a compilation order.
>
> You might ask why X and Y are split into two separate vignettes. The use of different vignettes is motivated by the complexity of the workflows:
>
> - Vignette 1 demonstrates core processing steps for one read-based single-cell RNAseq dataset.
> - Vignette 2 demonstrates (slightly different) core steps for a UMI-based dataset.
> - … so on for a bunch of other core steps for different types of data.
> - Vignette 6 demonstrates extra optional steps for the two SCEs produced by vignettes 1 & 3.
> - … and so on for a bunch of other optional steps.
>
> The separation between core and optional steps into separate documents is desirable. From a pedagogical perspective, I would very much like to get the reader through all the core steps before even considering the extra steps, which would just be confusing if presented so early on. Previously, everything was in a single document, which was difficult to read (for users) and to debug (for me), especially because I had to use contrived variable names to avoid clashes between different sections of the workflow that did similar things.
>
> @Martin I’ve been using BiocFileCache for all of the online resources that are used in the workflow. However, this is only for my (and the reader’s) convenience. I use a local cache rather than the system default, to ensure that the downloaded files are removed after package build. This is intentional as it forces the package builder to try to re-download resources when compiling the vignette, thus ensuring the validity of the URLs. For a similar reason, I would prefer not to cache the result objects for use in different R sessions. I could imagine caching the result objects for use by a different vignette in the same build session, but this gets back to the problem of ensuring that the result object is generated by one vignette before it is needed by another vignette.
>
> -A
>
>> On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc using gmail.com> wrote:
>>
>> Also perhaps using BiocFileCache so that the result object is only generated once, then cached for future (different session) use.
>>
>> On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces using r-project.org on behalf of lawrence.michael using gene.com> wrote:
>>
>>     I would recommend against dependencies across vignettes. Ideally someone
>>     can pick up a vignette and execute the code independently of any other
>>     documentation. Perhaps you could move the code generating those shared
>>     resources to the package. They could behave lazily, only generating the
>>     resource if necessary, otherwise reusing it. That would also make it easy
>>     for people to write their own documents using those resources.
>>
>>     Michael
>>
>>     On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
>>     infinite.monkeys.with.keyboards using gmail.com> wrote:
>>
>>> In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
>>> specific compilation order for my vignettes. This is because some vignettes
>>> set up resources or objects that are to be used by later vignettes.
>>>
>>>  From what I understand, vignettes are compiled in alphanumeric ordering of
>>> their file names. As such, I give my vignettes fairly structured names,
>>> e.g., “work-1-reads.Rmd”, “work-2-umi.Rmd” and so on.
>>>
>>> However, it becomes rather annoying when I want to add a new vignette in
>>> the middle somewhere. This results in some unnatural numberings, e.g.,
>>> “work-0”, “3b”, which are ugly and unintuitive. This is relevant as
>>> BiocStyle::Biocpkg() links between vignettes require you to use the
>>> destination vignette’s file name; so difficult names complicate linking,
>>> especially if the names continually change to reflect new orderings.
>>>
>>> Is there an easier way to control vignette compilation order? WRE provides
>>> no (obvious) guidance, so I would like to know what non-standard hacks are
>>> known to work on the build machines. I can imagine something dirty whereby
>>> one ”reference” vignette contains code to “rmarkdown::render" all other
>>> vignettes in the specified order… ugh.
>>>
>>> -A
>>>
>>> _______________________________________________
>>> Bioc-devel using r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=HK4GXFTI9jQmnGIwuZCIG3W6Mv_gfilqE0XppSWaO2I&s=AFfRo761pnzXCPY6EnVmNDZZ_Qg7oN8anptEHNVL4l0&e=
>>>
>>>
>>     	[[alternative HTML version deleted]]
>>
>>     _______________________________________________
>>     Bioc-devel using r-project.org mailing list
>>     https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=HK4GXFTI9jQmnGIwuZCIG3W6Mv_gfilqE0XppSWaO2I&s=AFfRo761pnzXCPY6EnVmNDZZ_Qg7oN8anptEHNVL4l0&e=
>>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=HK4GXFTI9jQmnGIwuZCIG3W6Mv_gfilqE0XppSWaO2I&s=AFfRo761pnzXCPY6EnVmNDZZ_Qg7oN8anptEHNVL4l0&e=

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319