[Bioc-devel] Sweave changes (keep.source = TRUE or FALSE?)

Kevin R. Coombes krc at mdacc.tmc.edu
Thu Dec 7 16:13:19 CET 2006


One of the problems I have with using R and BioConductor is that 
backwards compatibility rarely seems to be considered when new versions 
are released.  (That statement may be wrong, but it is the impression I 
have formed from watching things change over time.) Someone gets an idea 
for a structural change that can potentially break tons of existing 
code, and because it is theoretically better (and may even really be 
better; that's not the point), they go ahead and  implement the change. 
And when some poor user raises a question on one of these mailing lists, 
the most common answer seems to be "upgrade to the latest versions of R 
and BioConductor, and modify your code accordingly".  And then do it 
again next quarter.

By contrast, I have documents written in TeX in the early 1980's that 
still compile and still produce EXACTLY the same output. And I still 
have a lot of old perl scripts that still do exactly what they are 
supposed to do.  In those cases, Donald Knuth and Larry Wall have acted 
as "benevolent dictators" who insist that the people proposing changes 
have to at least give serious thought to how to ensure that existing 
code doesn't break.

I suspect my reaction to the proposed changes in Sweave results 
precisely because I like Sweave so much, and use it for every I analysis 
I perform.  Its primary virtue is for the production of documents that 
will need to remain available for a long time. That's why I want the 
documentation and the code in the same file, after all, so I can return 
to it when it's time to write the methods section to the manuscript that 
resulted from all those computations. And I can return to it again when 
someone sends me a question after the paper gets published. And by that 
time, I will probably have upgraded R and BioConductor, and I want the 
figures that I generate tomorrow to still be the same as the figures I 
sent to the publisher yesterday. And if I generated the actual PDF that 
I sent to the publisher from an Sweave file and if it includes code 
samples, then I don't want those code samples to change in the PDF file 
I produce tomorrow.

And yes; I do propose not changing the DEFAULT behavior of any existing 
function. That's what backwards compatibility means.  If you add 
additional features and you are going to put in an option to let the 
user control the behavior anyway, then the default option should ensure 
that code that works now will continue to produce the same results in 
the future. Even in new versions of R. Even in new versions of 
BioConductor.

Best,
	Kevin

Friedrich Leisch wrote:
>>>>>> On Wed, 06 Dec 2006 12:37:22 -0600,
>>>>>> Kevin R Coombes (KRC) wrote:
> 
>   > Hi,
>   > I don't really think anyone believes that the parse&deparse behavior was 
>   > exactly a "feature". Instead, I think the primary issue is one of 
>   > backwards compatibility.
> 
>   > You are proposing to change the behavior of Sweave in a manner that will 
>   > cause old code to break. Here "break" has two meanings. Some automatic 
>   > development tools will stop working on existing valid code.  In 
>   > addition, existing valid code will produce results that differ from what 
>   > they produced previously.
> 
>   > To deal with this, you are going to add an option that will allow users 
>   > to get the old behavior.  However, you propose to set the default value 
>   > of the option to require users to go back and modify all their old code 
>   > in order to prevent things from breaking. It seems obvious to me that 
>   > the default behavior should be the one that does not break old code or 
>   > require the editing of old code in order to get the old behavior.
> 
>   > The reason I use Sweave (for virtually every analysis I do any more) is 
>   > that I can guarantee that when I can go back to the code six months from 
>   > now, I can regenerate the analysis and I can regenerate the 
>   > documentation, and I know that I will get the same results. Changing the 
>   > default behavior of Sweave violates that guarantee, since the 
>   > documentation will not be identical to what it was before. Personally, I 
>   > am willing to pay the cost with NEW analyses to invoke the new behavior 
>   > explicitly (which I do agree is the preferred behavior) because I think 
>   > the goal of backwards compatibility is more important.
> 
>   > In other words, I disagree with your characterization of the 
>   > parse&deparse behavior as a "bug".  It did not cause incorrect results 
>   > in the documentation or the code, and everyone using Sweave knew about 
>   > the behavior.
> 
> Give me a break, that is simply nonsense. Sweave guarantees that you
> can reproduce your results USING THE VERSION OF R THAT WAS USED FOR
> THE ORIGINAL ANALYSIS, and that will still be true, because in R 2.4.x
> there is no option keep.source.  Running a new version of R means
> dozens of R functions will have changed, plotting functions may yield
> figures that look different etc. etc. ... what you propose is in
> essence that we are not allowed to change the default behaviour of ANY
> R FUNCTION. Is that what you are proposing?
> 
> And note that the changes I propose will not change any numerical
> results or figures, nor will I "break code", the only thing that
> changes is that the formatting of input lines looks different (and in
> most cases better, that's why we want to do it).
> 
> It's not like I am changing Sweave behavior every other week, actually
> it is the first one at all. I have thought a lot whether I want to do
> it or not, and I really think it is a good idea. What is great about R
> is that it is allowed to change (if changes are transparent and
> announced early enough). There is a certain operation system with a
> market share of about 90% where backwards compatibility is more
> important than development or security, and I don'tthink that should
> be our role model.
> 
> Best,
> Fritz



More information about the Bioc-devel mailing list