[Rd] R CMD check for the R code from vignettes

Sat May 31 06:22:03 CEST 2014

Hi Kevin,

Personally I also avoid code that have side effects in the inline
expressions, but I think there are legitimate use cases in which
inline expressions have side effects. This discussion was motivated by
Carl's knitcitations package, as well as another question on
StackOverflow (http://stackoverflow.com/q/23927325/559676).

I'm aware of the distinction between the original literate programming
paradigm and the one in R (that is why I said "literate programming in
R" instead of "literate programming in general"). In R, weave actually
does what both weave and tangle do in the original paradigm -- there
is no need to tangle the document to get the computer code so that we
can execute it.

To Carl: I agree that it is a little extreme to drop tangle entirely,
so I think at least knitr::purl() will stay there in the foreseeable
future. I tend to adopt Henrik's idea, i.e., to provide vignette
engines that just ignore tangle. At the moment, it seems R CMD check
is comfortable with vignettes that do not have corresponding R
scripts, and I hope these R scripts will not become mandatory in the
future.

Thanks everyone for your comments!

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name

On Fri, May 30, 2014 at 8:21 AM, Kevin Coombes
<kevin.r.coombes at gmail.com> wrote:
> Hi,
>
> Unless someone is planning to change Stangle to include inline expressions
> (which I am *not* advocating), I think that relying on side-effects within
> an \Sexpr construction is a bad idea. So, my own coding style is to restrict
> my use of \Sexpr to calls of the form
> \Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less believe
> that having R CMD check use Stangle and report an error is probably a good
> thing.
>
> There is a completely separate questions about the relationship between
> Sweave/Stangle or knit/purl and literate programming that is linked to your
> question about whether to use Stangle on vignettes. The underlying model(s)
> in R have drifted away from Knuth's original conception, for some good
> reasons.
>
> The original goal of literate programming was to be able to explain the
> algorithms and data structures in the code to humans.  For that purpose, it
> was important to have named code chunks that you could move around, which
> would allow you to describe the algorithm starting from a high level
> overview and then drilling down into the details. From this perspective,
> "tangle" was critical to being able to reconstruct a program that would
> compile and run correctly.
>
> The vast majority of applications of Sweave/Stangle or knit/purl in modern R
> have a completely different goal: to produce some sort of document that
> describes the results of an analysis to a non-programmer or
> non-statistician.  For this goal, "weave" is much more important than
> "tangle", because the most important aspect is the ability to integrate the
> results (figures, tables, etc) of running the code into the document that
> get passed off to the person for whom the analysis was prepared. As a
> result, the number of times in my daily work that I need to explicitly
> invoke Stangle (or purl) explicitly is many orders of magnitude smaller than
> the number of times that I invoke Sweave (or knitr).
>
>   -- Kevin
>
>
>
> On 5/30/2014 1:04 AM, Yihui Xie wrote:
>>
>> Hi,
>>
>> Recently I saw a couple of cases in which the package vignettes were
>> somewhat complicated so that Stangle() (or knitr::purl() or other
>> tangling functions) can fail to produce the exact R code that is
>> executed by the weaving function Sweave() (or knitr::knit(), ...). For
>> example, this is a valid document that can pass the weaving process
>> but cannot generate a valid R script to be source()d:
>>
>> \documentclass{article}
>> \begin{document}
>> Assign 1 to x: \Sexpr{x <- 1}
>> <<>>=
>> x + 1
>> @
>> \end{document}
>>
>> That is because the inline R code is not written to the R script
>> during the tangling process. When an R package vignette contains
>> inline R code expressions that have significant side effects, R CMD
>> check can fail because the tangled output is not correct. What I
>> showed here is only a trivial example, and I have seen two packages
>> that have more complicated scenarios than this. Anyway, the key thing
>> that I want to discuss here is, since the R code in the vignette has
>> been executed once during the weaving process, does it make much sense
>> to execute the code generated from the tangle function? In other
>> words, if the weaving process has succeeded, is it necessary to
>> source() the R script again?
>>
>> The two options here are:
>>
>> 1. Do not check the R code from vignettes;
>> 2. Or fix the tangle function so that it produces exactly what was
>> executed in the weaving process. If this is done, I'm back to my
>> previous question: does it make sense to run the code twice?
>>
>> To push this a little further, personally I do not quite appreciate
>> literate programming in R as two separate steps, namely weave and
>> tangle. In particular, I do not see the value of tangle, considering
>> Sweave() (or knitr::knit()) as the new "source()". Therefore
>> eventually I tend to just drop tangle, but perhaps I missed something
>> here, and I'd like to hear what other people think about it.
>>
>> Regards,
>> Yihui
>> --
>> Yihui Xie <xieyihui at gmail.com>
>> Web: http://yihui.name