[Rd] R vs. C now rather: how to ease package checking
Claudia Beleites
cbeleites at units.it
Tue Jan 18 10:48:48 CET 2011
On 01/18/2011 01:13 AM, Dominick Samperi wrote:
> On Mon, Jan 17, 2011 at 7:00 PM, Spencer Graves<
> spencer.graves at structuremonitoring.com> wrote:
>
>> Hi, Dominick, et al.:
>>
>>
>> Demanding complete unit test suites with all software contributed to
>> CRAN would likely cut contributions by a factor of 10 or 100. For me, the R
>> package creation process is close to perfection in providing a standard
>> process for documentation with places for examples and test suites of
>> various kinds. I mention "perfection", because it makes developing
>> "trustworthy software" (Chamber's "prime directive") relatively easy without
>> forcing people to do things they don't feel comfortable doing.
>>
>
> I don't think I made myself clear, sorry. I was not suggesting that package
> developers include a complete unit
> test suite. I was suggesting that unit testing should be done outside of the
> CRAN release process. Packages
> should be submitted for "release" to CRAN after they have been tested (the
> responsibility of the package
> developers). I understand that the main problem here is that package
> developers do not have access to
> all supported platforms, so the current process is not likely to change.
Regarding access to all platforms: But there's r-forge where building and checks
are done nightly for Linux, Win, and Mac (though for some months now the check
protocols are not available for 32 bit Linux and Windows - but I hope they'll be
back soon).
I found it extremely easy to get an account & project space and building.
Many thanks to r-forge!
complete unit test suites:
To me, it seems nicer and better to favour packages that do it than mechanical
enforcement. E.g. show icons that announce if a package comes with vignette,
test suite (code coverage), and etc.
My 2 ct,
Claudia
>
> Dominick
>
>
>>
>> If you need more confidence in the software you use, you can build
>> your own test suites -- maybe in packages you write yourself -- or pay
>> someone else to develop test suites to your specifications. For example,
>> Revolution Analytics offers "Package validation, development and support".
>>
>>
>> Spencer
>>
>>
>>
>> On 1/17/2011 3:27 PM, Dominick Samperi wrote:
>>
>>> On Mon, Jan 17, 2011 at 5:15 PM, Spencer Graves<
>>> spencer.graves at structuremonitoring.com> wrote:
>>>
>>> Hi, Paul:
>>>>
>>>>
>>>> The "Writing R Extensions" manual says that *.R code in a "tests"
>>>> directory is run during "R CMD check". I suspect that many R programmers
>>>> do
>>>> this routinely. I probably should do that also. However, for me, it's
>>>> simpler to have everything in the "examples" section of *.Rd files. I
>>>> think
>>>> the examples with independently developed answers provides useful
>>>> documentation.
>>>>
>>>> This is a unit test function, and I think it would be better if there
>>> was a
>>> way to unit test packages *before* they
>>> are released to CRAN. Otherwise, this is not really a "release," it is
>>> test
>>> or "beta" version. This is currently
>>> possible under Windows using http://win-builder.r-project.org/, for
>>> example.
>>>
>>> My earlier remark about the release process was more about documentation
>>> than about unit testing, more
>>> about the gentle "nudging" that the R release process does to help insure
>>> consistent documentation and
>>> organization, and about how this nudging might be extended to the C/C++
>>> part
>>> of a package.
>>>
>>> Dominick
>>>
>>>
>>> Spencer
>>>>
>>>>
>>>>
>>>> On 1/17/2011 1:52 PM, Paul Gilbert wrote:
>>>>
>>>> Spencer
>>>>>
>>>>> Would it not be easier to include this kind of test in a small file in
>>>>> the
>>>>> tests/ directory?
>>>>>
>>>>> Paul
>>>>>
>>>>> -----Original Message-----
>>>>> From: r-devel-bounces at r-project.org [mailto:
>>>>> r-devel-bounces at r-project.org]
>>>>> On Behalf Of Spencer Graves
>>>>> Sent: January 17, 2011 3:58 PM
>>>>> To: Dominick Samperi
>>>>> Cc: Patrick Leyshock; r-devel at r-project.org; Dirk Eddelbuettel
>>>>> Subject: Re: [Rd] R vs. C
>>>>>
>>>>>
>>>>> For me, a major strength of R is the package development
>>>>> process. I've found this so valuable that I created a Wikipedia entry
>>>>> by that name and made additions to a Wikipedia entry on "software
>>>>> repository", noting that this process encourages good software
>>>>> development practices that I have not seen standardized for other
>>>>> languages. I encourage people to review this material and make
>>>>> additions or corrections as they like (or sent me suggestions for me to
>>>>> make appropriate changes).
>>>>>
>>>>>
>>>>> While R has other capabilities for unit and regression testing, I
>>>>> often include unit tests in the "examples" section of documentation
>>>>> files. To keep from cluttering the examples with unnecessary material,
>>>>> I often include something like the following:
>>>>>
>>>>>
>>>>> A1<- myfunc() # to test myfunc
>>>>>
>>>>> A0<- ("manual generation of the correct answer for A1")
>>>>>
>>>>> \dontshow{stopifnot(} # so the user doesn't see "stopifnot("
>>>>> all.equal(A1, A0) # compare myfunc output with the correct answer
>>>>> \dontshow{)} # close paren on "stopifnot(".
>>>>>
>>>>>
>>>>> This may not be as good in some ways as a full suite of unit
>>>>> tests, which could be provided separately. However, this has the
>>>>> distinct advantage of including unit tests with the documentation in a
>>>>> way that should help users understand "myfunc". (Unit tests too
>>>>> detailed to show users could be completely enclosed in "\dontshow".
>>>>>
>>>>>
>>>>> Spencer
>>>>>
>>>>>
>>>>> On 1/17/2011 11:38 AM, Dominick Samperi wrote:
>>>>>
>>>>> On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves<
>>>>>> spencer.graves at structuremonitoring.com> wrote:
>>>>>>
>>>>>> Another point I have not yet seen mentioned: If your code is
>>>>>>
>>>>>>> painfully slow, that can often be fixed without leaving R by
>>>>>>> experimenting
>>>>>>> with different ways of doing the same thing -- often after using
>>>>>>> profiling
>>>>>>> your code to find the slowest part as described in chapter 3 of
>>>>>>> "Writing
>>>>>>> R
>>>>>>> Extensions".
>>>>>>>
>>>>>>>
>>>>>>> If I'm given code already written in C (or some other language),
>>>>>>> unless it's really simple, I may link to it rather than recode it in
>>>>>>> R.
>>>>>>> However, the problems with portability, maintainability,
>>>>>>> transparency
>>>>>>> to
>>>>>>> others who may not be very facile with C, etc., all suggest that it's
>>>>>>> well
>>>>>>> worth some effort experimenting with alternate ways of doing the same
>>>>>>> thing
>>>>>>> in R before jumping to C or something else.
>>>>>>>
>>>>>>> Hope this helps.
>>>>>>> Spencer
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 1/17/2011 10:57 AM, David Henderson wrote:
>>>>>>>
>>>>>>> I think we're also forgetting something, namely testing. If you
>>>>>>> write
>>>>>>>
>>>>>>>> your
>>>>>>>> routine in C, you have placed additional burden upon yourself to test
>>>>>>>> your
>>>>>>>> C
>>>>>>>> code through unit tests, etc. If you write your code in R, you still
>>>>>>>> need
>>>>>>>> the
>>>>>>>> unit tests, but you can rely on the well tested nature of R to allow
>>>>>>>> you
>>>>>>>> to
>>>>>>>> reduce the number of tests of your algorithm. I routinely tell
>>>>>>>> people
>>>>>>>> at
>>>>>>>> Sage
>>>>>>>> Bionetworks where I am working now that your new C code needs to
>>>>>>>> experience at
>>>>>>>> least one order of magnitude increase in performance to warrant the
>>>>>>>> effort
>>>>>>>> of
>>>>>>>> moving from R to C.
>>>>>>>>
>>>>>>>> But, then again, I am working with scientists who are not primarily,
>>>>>>>> or
>>>>>>>> even
>>>>>>>> secondarily, coders...
>>>>>>>>
>>>>>>>> Dave H
>>>>>>>>
>>>>>>>>
>>>>>>>> This makes sense, but I have seem some very transparent algorithms
>>>>>>>>
>>>>>>> turned
>>>>>> into vectorized R code
>>>>>> that is difficult to read (and thus to maintain or to change). These
>>>>>> chunks
>>>>>> of optimized R code are like
>>>>>> embedded assembly, in the sense that nobody is likely to want to mess
>>>>>> with
>>>>>> it. This could be addressed
>>>>>> by including pseudo code for the original (more transparent) algorithm
>>>>>> as
>>>>>> a
>>>>>> comment, but I have never
>>>>>> seen this done in practice (perhaps it could be enforced by R CMD
>>>>>> check?!).
>>>>>>
>>>>>> On the other hand, in principle a well-documented piece of C/C++ code
>>>>>> could
>>>>>> be much easier to understand,
>>>>>> without paying a performance penalty...but "coders" are not likely to
>>>>>> place
>>>>>> this high on their
>>>>>> list of priorities.
>>>>>>
>>>>>> The bottom like is that R is an adaptor ("glue") language like Lisp
>>>>>> that
>>>>>> makes it easy to mix and
>>>>>> match functions (using classes and generic functions), many of which
>>>>>> are
>>>>>> written in C (or C++
>>>>>> or Fortran) for performance reasons. Like any object-based system there
>>>>>> can
>>>>>> be a lot of
>>>>>> object copying, and like any functional programming system, there can
>>>>>> be
>>>>>> a
>>>>>> lot of function
>>>>>> calls, resulting in poor performance for some applications.
>>>>>>
>>>>>> If you can vectorize your R code then you have effectively found a way
>>>>>> to
>>>>>> benefit from
>>>>>> somebody else's C code, thus saving yourself some time. For operations
>>>>>> other
>>>>>> than pure
>>>>>> vector calculations you will have to do the C/C++ programming yourself
>>>>>> (or
>>>>>> call a library
>>>>>> that somebody else has written).
>>>>>>
>>>>>> Dominick
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message ----
>>>>>>
>>>>>>> From: Dirk Eddelbuettel<edd at debian.org>
>>>>>>>> To: Patrick Leyshock<ngkbr8es at gmail.com>
>>>>>>>> Cc: r-devel at r-project.org
>>>>>>>> Sent: Mon, January 17, 2011 10:13:36 AM
>>>>>>>> Subject: Re: [Rd] R vs. C
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17 January 2011 at 09:13, Patrick Leyshock wrote:
>>>>>>>> | A question, please about development of R packages:
>>>>>>>> |
>>>>>>>> | Are there any guidelines or best practices for deciding when and
>>>>>>>> why
>>>>>>>> to
>>>>>>>> | implement an operation in R, vs. implementing it in C? The
>>>>>>>> "Writing
>>>>>>>> R
>>>>>>>> | Extensions" recommends "working in interpreted R code . . . this is
>>>>>>>> normally
>>>>>>>> | the best option." But we do write C-functions and access them in R
>>>>>>>> -
>>>>>>>> the
>>>>>>>> | question is, when/why is this justified, and when/why is it NOT
>>>>>>>> justified?
>>>>>>>> |
>>>>>>>> | While I have identified helpful documents on R coding standards, I
>>>>>>>> have
>>>>>>>> not
>>>>>>>> | seen notes/discussions on when/why to implement in R, vs. when to
>>>>>>>> implement
>>>>>>>> | in C.
>>>>>>>>
>>>>>>>> The (still fairly recent) book 'Software for Data Analysis:
>>>>>>>> Programming
>>>>>>>> with
>>>>>>>> R' by John Chambers (Springer, 2008) has a lot to say about this.
>>>>>>>> John
>>>>>>>> also
>>>>>>>> gave a talk in November which stressed 'multilanguage' approaches;
>>>>>>>> see
>>>>>>>> e.g.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html
>>>>>>>>
>>>>>>>>
>>>>>>>> In short, it all depends, and it is unlikely that you will get a
>>>>>>>> coherent
>>>>>>>> answer that is valid for all circumstances. We all love R for how
>>>>>>>> expressive
>>>>>>>> and powerful it is, yet there are times when something else is called
>>>>>>>> for.
>>>>>>>> Exactly when that time is depends on a great many things and you have
>>>>>>>> not
>>>>>>>> mentioned a single metric in your question. So I'd start with John's
>>>>>>>> book.
>>>>>>>>
>>>>>>>> Hope this helps, Dirk
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>>
>>>>>>> R-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>>
>>>>> ====================================================================================
>>>>>
>>>>> La version fran�aise suit le texte anglais.
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------------
>>>>>
>>>>> This email may contain privileged and/or confidential information, and
>>>>> the
>>>>> Bank of
>>>>> Canada does not waive any related rights. Any distribution, use, or
>>>>> copying of this
>>>>> email or the information it contains by other than the intended
>>>>> recipient
>>>>> is
>>>>> unauthorized. If you received this email in error please delete it
>>>>> immediately from
>>>>> your system and notify the sender promptly by email that you have done
>>>>> so.
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------------
>>>>>
>>>>> Le pr�sent courriel peut contenir de l'information privil�gi�e ou
>>>>> confidentielle.
>>>>> La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute
>>>>> diffusion,
>>>>> utilisation ou copie de ce courriel ou des renseignements qu'il contient
>>>>> par une
>>>>> personne autre que le ou les destinataires d�sign�s est interdite. Si
>>>>> vous
>>>>> recevez
>>>>> ce courriel par erreur, veuillez le supprimer imm�diatement et envoyer
>>>>> sans d�lai �
>>>>> l'exp�diteur un message �lectronique pour l'aviser que vous avez �limin�
>>>>> de votre
>>>>> ordinateur toute copie du courriel re�u.
>>>>>
>>>>
>
> [[alternative HTML version deleted]]
>
>
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste
phone: +39 0 40 5 58-37 68
email: cbeleites at units.it
More information about the R-devel
mailing list