[Rd] R CMD check for the R code from vignettes

Henrik Bengtsson hb at biostat.ucsf.edu
Fri May 30 19:15:46 CEST 2014

I think there are several aspects to Yihue's post and some simple
workarounds/long solutions to the issues:

1. For the reasons argued, I would agree that 'R CMD check'
incorrectly assumes that tangled code script should be able to run
without errors.  Instead I think it should only check the syntax, i.e.
that it can be parsed without errors.  If not, then Sweave may have to
be redfined to clarify that \Sexpr{}/"inline" expressions must not
have "side effects".

2. For other (=non-Sweave) vignette builder packages, you can already
today define engines that do not tangle, think
%\VignetteEngine{knitr::knitr_no_tangle}.

3. Extending on this, I'd like to propose %\VignetteTangle{no} (and/or
false, FALSE, ...), which would tell the engine to not generate the
"tangle" script file.  Then it is up to the vignette engine to
acknowledge this or not, but at least we will have a standard across
engines rather that each of us come up with their own markup for this.
You can also imagine that one support other types of settings, e.g.
%\VignetteTangle{all} to include also \Sexpr{} in the tangled output.

/Henrik

On Fri, May 30, 2014 at 9:29 AM, Carl Boettiger <cboettig at gmail.com> wrote:
> Hi Yihui,
>
> I agree with you (and your comments in [knitr issue 784]) that it seems
> wrong for R CMD check to be using tangle (purl, etc) as a way to check R
> code in a vignette, when the standard and expected way to check the
> vignette is already to knit / Sweave the vignette.
>
> I also agree with the perspective that the tangle function no longer plays
> the crucial role it did when we were using noweb and C programs that
> couldn't be compiled without tangle.
>
> However, I would be hesitant to see tangle removed entirely, as it is
> occasionally a convenient way to create an R script from a dynamic
> document.  Pure R scripts are still much more widely recognized than
> dynamic documents, and I sometimes will just tangle out the R code because
> a collaborator would have no idea what to do with a .Rmd file (Though
> RStudio is certainly improving this situation).  Tangle-like functions also
> provides a nice compliment to the "stitch" and friends that make dynamic
> documents from the ubiquitous R scripts.
>
> [knitr issue 784]: https://github.com/yihui/knitr/issues/784
>
>
> - Carl
>
>
>
> On Fri, May 30, 2014 at 6:21 AM, Kevin Coombes <kevin.r.coombes at gmail.com>
> wrote:
>
>> Hi,
>>
>> Unless someone is planning to change Stangle to include inline expressions
>> (which I am *not* advocating), I think that relying on side-effects within
>> an \Sexpr construction is a bad idea. So, my own coding style is to
>> restrict my use of \Sexpr to calls of the form
>> \Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less
>> believe that having R CMD check use Stangle and report an error is probably
>> a good thing.
>>
>> There is a completely separate questions about the relationship between
>> Sweave/Stangle or knit/purl and literate programming that is linked to your
>> question about whether to use Stangle on vignettes. The underlying model(s)
>> in R have drifted away from Knuth's original conception, for some good
>> reasons.
>>
>> The original goal of literate programming was to be able to explain the
>> algorithms and data structures in the code to humans.  For that purpose, it
>> was important to have named code chunks that you could move around, which
>> would allow you to describe the algorithm starting from a high level
>> overview and then drilling down into the details. From this perspective,
>> "tangle" was critical to being able to reconstruct a program that would
>> compile and run correctly.
>>
>> The vast majority of applications of Sweave/Stangle or knit/purl in modern
>> R have a completely different goal: to produce some sort of document that
>> describes the results of an analysis to a non-programmer or
>> non-statistician.  For this goal, "weave" is much more important than
>> "tangle", because the most important aspect is the ability to integrate the
>> results (figures, tables, etc) of running the code into the document that
>> get passed off to the person for whom the analysis was prepared. As a
>> result, the number of times in my daily work that I need to explicitly
>> invoke Stangle (or purl) explicitly is many orders of magnitude smaller
>> than  the number of times that I invoke Sweave (or knitr).
>>
>>   -- Kevin
>>
>>
>>
>> On 5/30/2014 1:04 AM, Yihui Xie wrote:
>>
>>> Hi,
>>>
>>> Recently I saw a couple of cases in which the package vignettes were
>>> somewhat complicated so that Stangle() (or knitr::purl() or other
>>> tangling functions) can fail to produce the exact R code that is
>>> executed by the weaving function Sweave() (or knitr::knit(), ...). For
>>> example, this is a valid document that can pass the weaving process
>>> but cannot generate a valid R script to be source()d:
>>>
>>> \documentclass{article}
>>> \begin{document}
>>> Assign 1 to x: \Sexpr{x <- 1}
>>> <<>>=
>>> x + 1
>>> @
>>> \end{document}
>>>
>>> That is because the inline R code is not written to the R script
>>> during the tangling process. When an R package vignette contains
>>> inline R code expressions that have significant side effects, R CMD
>>> check can fail because the tangled output is not correct. What I
>>> showed here is only a trivial example, and I have seen two packages
>>> that have more complicated scenarios than this. Anyway, the key thing
>>> that I want to discuss here is, since the R code in the vignette has
>>> been executed once during the weaving process, does it make much sense
>>> to execute the code generated from the tangle function? In other
>>> words, if the weaving process has succeeded, is it necessary to
>>> source() the R script again?
>>>
>>> The two options here are:
>>>
>>> 1. Do not check the R code from vignettes;
>>> 2. Or fix the tangle function so that it produces exactly what was
>>> executed in the weaving process. If this is done, I'm back to my
>>> previous question: does it make sense to run the code twice?
>>>
>>> To push this a little further, personally I do not quite appreciate
>>> literate programming in R as two separate steps, namely weave and
>>> tangle. In particular, I do not see the value of tangle, considering
>>> Sweave() (or knitr::knit()) as the new "source()". Therefore
>>> eventually I tend to just drop tangle, but perhaps I missed something
>>> here, and I'd like to hear what other people think about it.
>>>
>>> Regards,
>>> Yihui
>>> --
>>> Yihui Xie <xieyihui at gmail.com>
>>> Web: http://yihui.name
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Carl Boettiger
> UC Santa Cruz
> http://carlboettiger.info/
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel