[Rd] R vs. C

Mon Jan 17 23:15:20 CET 2011

Hi, Paul:

       The "Writing R Extensions" manual says that *.R code in a "tests" 
directory is run during "R CMD check".  I suspect that many R 
programmers do this routinely.  I probably should do that also.  
However, for me, it's simpler to have everything in the "examples" 
section of *.Rd files.  I think the examples with independently 
developed answers provides useful documentation.

       Spencer

On 1/17/2011 1:52 PM, Paul Gilbert wrote:
> Spencer
>
> Would it not be easier to include this kind of test in a small file in the tests/ directory?
>
> Paul
>
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Spencer Graves
> Sent: January 17, 2011 3:58 PM
> To: Dominick Samperi
> Cc: Patrick Leyshock; r-devel at r-project.org; Dirk Eddelbuettel
> Subject: Re: [Rd] R vs. C
>
>
>         For me, a major strength of R is the package development
> process.  I've found this so valuable that I created a Wikipedia entry
> by that name and made additions to a Wikipedia entry on "software
> repository", noting that this process encourages good software
> development practices that I have not seen standardized for other
> languages.  I encourage people to review this material and make
> additions or corrections as they like (or sent me suggestions for me to
> make appropriate changes).
>
>
>         While R has other capabilities for unit and regression testing, I
> often include unit tests in the "examples" section of documentation
> files.  To keep from cluttering the examples with unnecessary material,
> I often include something like the following:
>
>
> A1<- myfunc() # to test myfunc
>
> A0<- ("manual generation of the correct  answer for A1")
>
> \dontshow{stopifnot(} # so the user doesn't see "stopifnot("
> all.equal(A1, A0) # compare myfunc output with the correct answer
> \dontshow{)} # close paren on "stopifnot(".
>
>
>         This may not be as good in some ways as a full suite of unit
> tests, which could be provided separately.  However, this has the
> distinct advantage of including unit tests with the documentation in a
> way that should help users understand "myfunc".  (Unit tests too
> detailed to show users could be completely enclosed in "\dontshow".
>
>
>         Spencer
>
>
> On 1/17/2011 11:38 AM, Dominick Samperi wrote:
>> On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves<
>> spencer.graves at structuremonitoring.com>   wrote:
>>
>>>        Another point I have not yet seen mentioned:  If your code is
>>> painfully slow, that can often be fixed without leaving R by experimenting
>>> with different ways of doing the same thing -- often after using profiling
>>> your code to find the slowest part as described in chapter 3 of "Writing R
>>> Extensions".
>>>
>>>
>>>        If I'm given code already written in C (or some other language),
>>> unless it's really simple, I may link to it rather than recode it in R.
>>>    However, the problems with portability, maintainability, transparency to
>>> others who may not be very facile with C, etc., all suggest that it's well
>>> worth some effort experimenting with alternate ways of doing the same thing
>>> in R before jumping to C or something else.
>>>
>>>        Hope this helps.
>>>        Spencer
>>>
>>>
>>>
>>> On 1/17/2011 10:57 AM, David Henderson wrote:
>>>
>>>> I think we're also forgetting something, namely testing.  If you write
>>>> your
>>>> routine in C, you have placed additional burden upon yourself to test your
>>>> C
>>>> code through unit tests, etc.  If you write your code in R, you still need
>>>> the
>>>> unit tests, but you can rely on the well tested nature of R to allow you
>>>> to
>>>> reduce the number of tests of your algorithm.  I routinely tell people at
>>>> Sage
>>>> Bionetworks where I am working now that your new C code needs to
>>>> experience at
>>>> least one order of magnitude increase in performance to warrant the effort
>>>> of
>>>> moving from R to C.
>>>>
>>>> But, then again, I am working with scientists who are not primarily, or
>>>> even
>>>> secondarily, coders...
>>>>
>>>> Dave H
>>>>
>>>>
>> This makes sense, but I have seem some very transparent algorithms turned
>> into vectorized R code
>> that is difficult to read (and thus to maintain or to change). These chunks
>> of optimized R code are like
>> embedded assembly, in the sense that nobody is likely to want to mess with
>> it. This could be addressed
>> by including pseudo code for the original (more transparent) algorithm as a
>> comment, but I have never
>> seen this done in practice (perhaps it could be enforced by R CMD check?!).
>>
>> On the other hand, in principle a well-documented piece of C/C++ code could
>> be much easier to understand,
>> without paying a performance penalty...but "coders" are not likely to place
>> this high on their
>> list of priorities.
>>
>> The bottom like is that R is an adaptor ("glue") language like Lisp that
>> makes it easy to mix and
>> match functions (using classes and generic functions), many of which are
>> written in C (or C++
>> or Fortran) for performance reasons. Like any object-based system there can
>> be a lot of
>> object copying, and like any functional programming system, there can be a
>> lot of function
>> calls, resulting in poor performance for some applications.
>>
>> If you can vectorize your R code then you have effectively found a way to
>> benefit from
>> somebody else's C code, thus saving yourself some time. For operations other
>> than pure
>> vector calculations you will have to do the C/C++ programming yourself (or
>> call a library
>> that somebody else has written).
>>
>> Dominick
>>
>>
>>
>>>> ----- Original Message ----
>>>> From: Dirk Eddelbuettel<edd at debian.org>
>>>> To: Patrick Leyshock<ngkbr8es at gmail.com>
>>>> Cc: r-devel at r-project.org
>>>> Sent: Mon, January 17, 2011 10:13:36 AM
>>>> Subject: Re: [Rd] R vs. C
>>>>
>>>>
>>>> On 17 January 2011 at 09:13, Patrick Leyshock wrote:
>>>> | A question, please about development of R packages:
>>>> |
>>>> | Are there any guidelines or best practices for deciding when and why to
>>>> | implement an operation in R, vs. implementing it in C?  The "Writing R
>>>> | Extensions" recommends "working in interpreted R code . . . this is
>>>> normally
>>>> | the best option."  But we do write C-functions and access them in R -
>>>> the
>>>> | question is, when/why is this justified, and when/why is it NOT
>>>> justified?
>>>> |
>>>> | While I have identified helpful documents on R coding standards, I have
>>>> not
>>>> | seen notes/discussions on when/why to implement in R, vs. when to
>>>> implement
>>>> | in C.
>>>>
>>>> The (still fairly recent) book 'Software for Data Analysis: Programming
>>>> with
>>>> R' by John Chambers (Springer, 2008) has a lot to say about this.  John
>>>> also
>>>> gave a talk in November which stressed 'multilanguage' approaches; see
>>>> e.g.
>>>>
>>>> http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html
>>>>
>>>>
>>>> In short, it all depends, and it is unlikely that you will get a coherent
>>>> answer that is valid for all circumstances.  We all love R for how
>>>> expressive
>>>> and powerful it is, yet there are times when something else is called for.
>>>> Exactly when that time is depends on a great many things and you have not
>>>> mentioned a single metric in your question.  So I'd start with John's
>>>> book.
>>>>
>>>> Hope this helps, Dirk
>>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> ====================================================================================
>
> La version française suit le texte anglais.
>
> ------------------------------------------------------------------------------------
>
> This email may contain privileged and/or confidential ...{{dropped:25}}