[Bioc-devel] Controlling vignette compilation order

Michael Lawrence l@wrence@mich@el @ending from gene@com
Tue Dec 18 18:41:31 CET 2018


Sounds like a use case for drake...

On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun <
infinite.monkeys.with.keyboards using gmail.com> wrote:

> @Michael In this case, the resource produced by vignette X is a
> SingleCellExperiment object containing the results of various processing
> steps (normalization, clustering, etc.) described in that vignette.
>
> I can imagine a lazy evaluation model for this, but it wouldn’t be pretty.
> If I had another vignette Y that depended on the SCE produced by vignette
> X, I would need Y to execute all of the steps in X if X hadn’t already been
> run before Y. This gets us into the territory of Makefile-like
> dependencies, which seems even more complicated than simply specifying a
> compilation order.
>
> You might ask why X and Y are split into two separate vignettes. The use
> of different vignettes is motivated by the complexity of the workflows:
>
> - Vignette 1 demonstrates core processing steps for one read-based
> single-cell RNAseq dataset.
> - Vignette 2 demonstrates (slightly different) core steps for a UMI-based
> dataset.
> - … so on for a bunch of other core steps for different types of data.
> - Vignette 6 demonstrates extra optional steps for the two SCEs produced
> by vignettes 1 & 3.
> - … and so on for a bunch of other optional steps.
>
> The separation between core and optional steps into separate documents is
> desirable. From a pedagogical perspective, I would very much like to get
> the reader through all the core steps before even considering the extra
> steps, which would just be confusing if presented so early on. Previously,
> everything was in a single document, which was difficult to read (for
> users) and to debug (for me), especially because I had to use contrived
> variable names to avoid clashes between different sections of the workflow
> that did similar things.
>
> @Martin I’ve been using BiocFileCache for all of the online resources that
> are used in the workflow. However, this is only for my (and the reader’s)
> convenience. I use a local cache rather than the system default, to ensure
> that the downloaded files are removed after package build. This is
> intentional as it forces the package builder to try to re-download
> resources when compiling the vignette, thus ensuring the validity of the
> URLs. For a similar reason, I would prefer not to cache the result objects
> for use in different R sessions. I could imagine caching the result objects
> for use by a different vignette in the same build session, but this gets
> back to the problem of ensuring that the result object is generated by one
> vignette before it is needed by another vignette.
>
> -A
>
> > On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc using gmail.com> wrote:
> >
> > Also perhaps using BiocFileCache so that the result object is only
> generated once, then cached for future (different session) use.
> >
> > On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <
> bioc-devel-bounces using r-project.org on behalf of lawrence.michael using gene.com>
> wrote:
> >
> >    I would recommend against dependencies across vignettes. Ideally
> someone
> >    can pick up a vignette and execute the code independently of any other
> >    documentation. Perhaps you could move the code generating those shared
> >    resources to the package. They could behave lazily, only generating
> the
> >    resource if necessary, otherwise reusing it. That would also make it
> easy
> >    for people to write their own documents using those resources.
> >
> >    Michael
> >
> >    On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
> >    infinite.monkeys.with.keyboards using gmail.com> wrote:
> >
> >> In a number of my workflow packages (e.g., simpleSingleCell), I rely on
> a
> >> specific compilation order for my vignettes. This is because some
> vignettes
> >> set up resources or objects that are to be used by later vignettes.
> >>
> >> From what I understand, vignettes are compiled in alphanumeric ordering
> of
> >> their file names. As such, I give my vignettes fairly structured names,
> >> e.g., “work-1-reads.Rmd”, “work-2-umi.Rmd” and so on.
> >>
> >> However, it becomes rather annoying when I want to add a new vignette in
> >> the middle somewhere. This results in some unnatural numberings, e.g.,
> >> “work-0”, “3b”, which are ugly and unintuitive. This is relevant as
> >> BiocStyle::Biocpkg() links between vignettes require you to use the
> >> destination vignette’s file name; so difficult names complicate linking,
> >> especially if the names continually change to reflect new orderings.
> >>
> >> Is there an easier way to control vignette compilation order? WRE
> provides
> >> no (obvious) guidance, so I would like to know what non-standard hacks
> are
> >> known to work on the build machines. I can imagine something dirty
> whereby
> >> one ”reference” vignette contains code to “rmarkdown::render" all other
> >> vignettes in the specified order… ugh.
> >>
> >> -A
> >>
> >> _______________________________________________
> >> Bioc-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> >    _______________________________________________
> >    Bioc-devel using r-project.org mailing list
> >    https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list