[R] Unit Testing Frameworks: summary and brief discussion

Thu May 10 11:07:14 CEST 2007

Yes, butler is pretty much abandoned.  I didn't realise that runit
existed when I first wrote it, so much of the functionality is
probably implemented better there (with maintainers that are actually
doing something with the package).

That said, the purpose of butler wasn't only for unit testing, it also
contains some code (which doesn't duplicate functionality available
elsewhere, to the best of my knowledge) for benchmarking (benchmark)
and profiling (stopwatch).  The profiling code formats the output of
Rprof in a slightly different way to the default, and also has a
graphical display.  There is to be a natural connection between
profiling and tree based partitioning, so I hope one day to implement
a visualisation, based on the ideas of Klimt, that makes it easier to
explore profiling data.

Hadley

On 5/9/07, anthony.rossini at novartis.com <anthony.rossini at novartis.com> wrote:
> Greetings -
>
> I'm finally finished review, here's what I heard:
>
> ============ from Tobias Verbeke:
>
> anthony.rossini at novartis.com wrote:
> > Greetings!
> >
> > After a quick look at current programming tools, especially with regards
>
> > to unit-testing frameworks, I've started looking at both "butler" and
> > "RUnit".   I would be grateful to receieve real world development
> > experience and opinions with either/both.    Please send to me directly
> > (yes, this IS my work email), I will summarize (named or anonymous, as
> > contributers desire) to the list.
> >
> I'm founding member of an R Competence Center at an international
> consulting company delivering R services
> mainly to the financial and pharmaceutical industries. Unit testing is
> central to our development methodology
> and we've been systematically using RUnit with great satisfaction,
> mainly because of its simplicity. The
> presentation of test reports is basic, though. Experiences concerning
> interaction with the RUnit developers
> are very positive: gentle and responsive people.
>
> We've never used butler. I think it is not actively developed (even if
> the developer is very active).
>
> It should be said that many of our developers (including myself) have
> backgrounds in statistics (more than in cs
> or software engineering) and are not always acquainted with the
> functionality in other unit testing frameworks
> and the way they integrate in IDEs as is common in these other languages.
>
> I'll soon be personally working with a JUnit guru and will take the
> opportunity to benchmark RUnit/ESS/emacs against
> his toolkit (Eclipse with JUnit- and other plugins, working `in perfect
> harmony' (his words)). Even if in my opinion the
> philosophy of test-driven development is much more important than the
> tools used, it is useful to question them from
> time to time and your message reminded me of this... I'll keep you
> posted if it interests you. Why not work out an
> evaluation grid / check list for unit testing frameworks ?
>
> Totally unrelated to the former, it might be interesting to ask oneself
> how ESS could be extended to ease unit testing:
> after refactoring a function some M-x ess-unit-test-function
> automagically launches the unit-test for this particular
> function (based on the test function naming scheme), opens a *test
> report* buffer etc.
>
> Kind regards,
> Tobias
>
> ============ from Tony Plate:
>
> Hi, I've been looking at testing frameworks for R too, so I'm interested
> to hear of your experiences & perspective.
>
> Here's my own experiences & perspective:
> The requirements are:
>
> (1) it should be very easy to construct and maintain tests
> (2) it should be easy to run tests, both automatically and manually
> (3) it should be simple to look at test results and know what went wrong
> where
>
> I've been using a homegrown testing framework for S-PLUS that is loosely
> based on the R transcript style tests (run *.R and compare output with
> *.Rout.save in 'tests' dir).  There are two differences between this
> test framework and the standard R one:
> (1) the output to match and the input commands are generated from an
> annotated transcript (annotations can switch some tests in or out
> depending on the version used)
> (2) annotations can include text substitutions (regular expression
> style) to be made on the output before attempting to match (this helps
> make it easier to construct tests that will match across different
> versions that might have minor cosmetic differences in how output is
> formatted).
>
> We use this test framework for both unit-style tests and system testing
> (where multiple libraries interact and also call the database).
> One very nice aspect of this framework is that it is easy to construct
> tests -- just cut and paste from a command window.  Many tests can be
> generated very quickly this way (my impression is that is is much much
> faster to build tests by cutting and pasting transcripts from a command
> window than it is to build tests that use functions like all.equal() to
> compare data structures.) It is also easy to maintain tests in the face
> of change (e.g., with a new version of S-PLUS or with bug fixes to
> functions or with changed database contents) -- I use ediff in emacs to
> compare test output with the stored annotated transcript and can usually
> just use ediff commands to update the transcript.
>
> This has worked well for us and now we are looking at porting some code
> to R.  I've not seen anything that offers these conveniences in R.
>
> It wouldn't be too difficult to add these features to the built-in R
> testing framework, but I've not had success in getting anyone in R core
> to listen to even consider changes, so I've not pursued that route after
> an initial offer of some simple patches to tests.mk and wintests.mk.
>
> RUnit doesn't have transcript-style tests, but it wasn't very difficult
> to add support for transcript-style tests to it.  I'll probably go ahead
> and use some version of that for our porting project.  (And offer it to
> the community if the RUnit maintainers want to incorporate it.)  I also
> like the idea that RUnit has some code analysis tools -- that might
> support some future project that allowed one to catalogue the number of
> times each code path through a function was exercised by the tests.
>
> I just looked at 'butler' and it looks very much like RUnit to me -- and
> I didn't see any overview that explained differences.  Do you know of
> any differences?
>
> cheers,
>
> Tony Plate
>
>
> ============== from Paul Gilbert:
>
> Tony
>
> While this is not exactly your question, I have been using my own system
>   based on make and the tools use by R CMD build/check to do something I
> think of as unit testing. This pre-dates the unit-testing frameworks, in
> fact, some of it predates R. I actually wrote something on this at one
> point: Paul Gilbert. R package maintenance. R News, 4(2):21-24,
> September 2004.
>
> I have occasionally thought about trying to use RUnit, but never done
> much because I am relatively happy with what I have. (Inertia is an
> issue too.)  I would be happy to hear if you do an assessment of the
> various tools.
>
> Best,
> Paul Gilbert
>
>
> ============= From Seth Falcon:
>
> Hi Tony,
>
> anthony.rossini at novartis.com writes:
> > After a quick look at current programming tools, especially with regards
>
> > to unit-testing frameworks, I've started looking at both "butler" and
> > "RUnit".   I would be grateful to receieve real world development
> > experience and opinions with either/both.    Please send to me directly
> > (yes, this IS my work email), I will summarize (named or anonymous, as
> > contributers desire) to the list.
>
> I've been using RUnit and have been quite happy with it.  I had not
> heard of butler until I read your mail (!).
>
> RUnit behaves reasonably similarly to other *Unit frameworks and this
> made it easy to get started with as I have used both JUnit and PyUnit
> (unittest module).
>
> Two things to be wary of:
>
>   1. At last check, you cannot create classes in unit test code and
>      this makes it difficult to test some types of functionality.  I'm
>      really not sure to what extent this is RUnit's fault as opposed
>      to limitation of the S4 implemenation in R.
>
>   2. They have chosen a non-default RNG, but recent versions provide a
>      way to override this.  This provided for some difficult bug
>      hunting when unit tests behaved differently than hand-run code
>      even with set.seed().
>
> The maintainer has been receptive to feedback and patches.  You can
> look at the not-so-beautiful scripts and such we are using if you look
> at inst/UnitTest in: Category, GOstats, Biobase, graph
>
> Best Wishes,
>
> + seth
>
>
> =================== Discussion:
>
> After a bit more cursory use, it looks like RUnit is probably the right
> approach at this time (sorry Hadley!).   Both RUnit and butler have a
> range of testing facilities and programming support tools.   I support the
> above statements about feasibility and problems -- except that I didn't
> get a chance to checkout the S4 issues that Seth raised above.    The one
> piece that I found missing in my version was some form of GUI based
> tester, i.e. push a button and test, but I think I've not thought through
> some of the issues with environments and closures yet that might cause
> problems.
>
> Thanks to everyone for responses!  It's clear that there is a good start
> here, but lots of room for improvement exists.
>
> Best regards / Mit freundlichen Grüssen,
> Anthony (Tony) Rossini
> Novartis Pharma AG
> MODELING & SIMULATION
> Group Head a.i., EU Statistical Modeling
> CHBS, WSJ-027.1.012
> Novartis Pharma AG
> Lichtstrasse 35
> CH-4056 Basel
> Switzerland
> Phone: +41 61 324 4186
> Fax: +41 61 324 3039
> Cell: +41 79 367 4557
> Email : anthony.rossini at novartis.com
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>