[Bioc-devel] Is it OK for Rmd package vignettes to be rendered as PDF?

Henrik Bengtsson henrik.bengtsson at gmail.com
Fri Aug 19 19:21:03 CEST 2016


On Thu, Aug 18, 2016 at 4:45 PM, Wolfgang Huber <whuber at embl.de> wrote:
>
>
>> On 17 Aug 2016, at 13:02, Henrik Bengtsson <henrik.bengtsson at gmail.com> wrote:
>>
>> R CMD build, which is what triggers vignette  building, only supports one
>> output file (HTML or PDF) per vignette. It will basically ignore duplicate
>> output formats. This is by design / legacy reasons. Technically it wouldn't
>> be hard to add support for multiple output formats, but that would require
>> changes to R itself - I think it could be a useful feature.
>
> Henrik, I’m sure you have more experience and insight with this than I, but I wonder when (at what stage) and what for R needs to be changed? It seems there are several issues:
> (a) having both the PDF and HTML be built by the build system and be shipped with the package
> (b) making them discoverable on the Bioc package landing page, and on the index page of the R-help system.
> (c) making (a) and (b) easy and standardized for package authors
>
> Re (a), on first sight, it seems that simply adding the YAML lines Ramon mentioned to the vignette will NOT achieve this (it looks like only whatever is the first output format stated, is produced), but  according to
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Writing-package-vignettes
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Non_002dSweave-vignettes
> I expect that with sufficient cleverness with (i) a Makefile and/or (ii) registering your own VignetteBuilder (some wrapper around knitr::render that makes sure both outputs are built, with only one run of the R code) it should be possible to achieve (a).
>
> For something almost as good as (b) [or better?], you could have the HTML indexed, and in it e.g. at the top have a button with a link to the PDF file, for those who want to print it.
>

> For (c), I suppose changing R would be handy. Or BiocStyle?

Just a quick background: I was the one adjusting a large chunk of the
vignette code for R 3.0.0 (Feb-March 2013) in order to add support for
generic vignette engines after Yihui Xie and Duncan Murdoch had laid
the groundwork adding support for knitr.  When doing this I did think
about supporting multiple weave output formats (and also keeping
intermediate TeX / Markdown / ... files). What I can recall from this
is that it shouldn't be too hard to do this.  The reason why it wasn't
done was mostly due to the fact that it would require an agreement by
others / R core and time was very short (just before the R 3.0.0
release) so updates were kept at a minimum.

Wolfgang, to answer you question: In my previous reply, I was focusing
on the R CMD build process itself because that's where most of the
action is happening when it comes to building vignettes, but there're
other parts that need to be updated as well.  But the core of the
issue here is that R, or more precisely the tools package, assumes
there should be exactly one output product per vignette.  For
instance, in tools:::find_vignette_product
() [https://github.com/wch/r-source/blob/trunk/src/library/tools/R/Vignettes.R#L80-L84]
we have checks like:

if (length(output) > 2L || (final && length(output) > 1L))
    stop(gettextf("Located more than one %s output file (by engine %s)
for vignette with name %s: %s", sQuote(by),
sQuote(sprintf("%s::%s", engine$package, engine$name)),
sQuote(name), paste(sQuote(output), collapse=", ")),
domain = NA)

where 'output' holds any matching *.pdf and *.html file (and final =
TRUE).  (In my previous comment I said duplicated outputs would be
ignored, but it seems that there'll be an error instead).

There is also an internal vignette "meta data" data frame holding the
vignette name, title, weave and tangle output files (or something like
that).  The weave output field is a character vector of one element
per vignette.  This data frame is used in several places.  This has to
be updated such that it can hold more than one weave output file per
vignette, i.e. something like meta$weave[[idx]] should be able to hold
one or more strings.  Then functions / mechanisms that make use of
this meta data need to be adjusted, e.g. vignettes(), vignette(),
functions to build the vignette index HTML page etc.  There's probably
needs to be new features added, e.g. what format should be opened by
default when calling vignette()?

So, again, I think this is fairly straightforward to implement, but
the first step is to convince R core that this should be done.  I
think one strong argument is that PDF alone is a rather bad format for
screen readers while HTML is a much better in this sense.  One could
also imagine vignette engines that are designed to provide highly
screen-reader friendly output files / formats in addition to the
standard HTML / PDF formats.  This raises the question whether R
should do this or if that's better left to other software convert this
from the HTML file.  On the other hand, maybe the HTML file doesn't
contain all necessary information and it's better to work off an
intermediate file format.

As Martin points out, preferably vignette engines that output to
multiple formats should be smart enough not to rerun everything from
scratch, but instead generate the PDF and HTML files based on some
intermediate static format (e.g. Markdown).

/Henrik

>
>         Wolfgang
>
>
>>
>
>> A related question is where some prefer to have access to also the
>> intermediate plain Markdown / TeX rather than the final HTML / PDF product,
>> e.g. because they work better with screen readers.
>>
>> The only way I see you can have a PDF and a HTML version at the same time
>> is to create to identical vignettes each outputting a specific format.
>>
>> Henrik
>>
>> On Aug 17, 2016 12:17, "Ramon Diaz-Uriarte" <rdiaz02 at gmail.com> wrote:
>>
>>>
>>> Dear All,
>>>
>>> I am considering rewriting the vignette of one BioC package I maintain as
>>> Rmd (it is currently Rnw). But I would like that the entry under
>>> "Documentation" contain a PDF of the vignette; it can ideally also contain
>>> the HTML version too, but I do not want it to not have the PDF[1].
>>>
>>>
>>> I know I can add entries to the document header such as
>>>
>>> output:
>>>  BiocStyle::pdf_document:
>>>    toc: true
>>>  BiocStyle::html_document:
>>>    toc: true
>>>
>>>
>>> that will, when run locally via "render('file.Rmd', output_format =
>>> 'all')", produce both formats.
>>>
>>>
>>>
>>> I've googled around, but I am not sure about:
>>>
>>> 1. If I have both output formats specified in the document header, will the
>>> BioC page of the package actually show both the PDF and the HTML of the
>>> vignette?
>>>
>>>
>>> 2. Is it OK (in conforming with BioC policies, sensible[1], whatever) to
>>> even try/want this? My reading of the doc for the BiocStyle
>>> (https://www.bioconductor.org/packages/devel/bioc/vignettes/
>>> BiocStyle/inst/doc/HtmlStyle.html)
>>> seems to suggest that the "natural" thing for Rmd vignettes is to be
>>> rendered as HTML, but I have not seen that producing PDF is discouraged
>>> explicitly.
>>>
>>>
>>> Best,
>>>
>>>
>>> R.
>>>
>>>
>>> [1] Why do I want to get a PDF if I am using Rmd? I want a PDF because this
>>> is a fairly long document that some users want to be able to print. I want
>>> HTML because some users prefer HTML and because I'd like to also place the
>>> vignette as HTML in Github Pages. I think that the only way to accomplish
>>> both is to use Rmd (not Rnw, even if I really, really, prefer LaTeX :-).
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Ramon Diaz-Uriarte
>>> Department of Biochemistry, Lab B-25
>>> Facultad de Medicina
>>> Universidad Autónoma de Madrid
>>> Arzobispo Morcillo, 4
>>> 28029 Madrid
>>> Spain
>>>
>>> Phone: +34-91-497-2412
>>>
>>> Email: rdiaz02 at gmail.com
>>>       ramon.diaz at iib.uam.es
>>>
>>> http://ligarto.org/rdiaz
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>       [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list