[R] Google's R Style Guide (has become S3 vs S4, in part)

Martin Morgan mtmorgan at fhcrc.org
Tue Sep 1 18:07:05 CEST 2009


spencerg wrote:
> Bryan Hanson wrote:
>> Looks like the discussion is no longer about R Style, but S3 vs S4?

yes nice topic rename!

>>
>> To that end, I asked more or less the same question a few weeks ago,
>> arising
>> from the much the same motivations.  The discussion was helpful,
>> here's the
>> link: 
>> http://www.nabble.com/Need-Advice%3A-Considering-Converting-a-Package-from-S
>>
>> 3-to-S4-tc24901482.html#a24904049
>>
>> For what it's worth, I decided, but with some ambivalence, to stay
>> with S3
>> for now and possibly move to S4 later.  In the spirit of S4, I did
>> write a
>> function that is nearly the equivalent of validObject for my S3 object of
>> interest.
>>
>> Overall, it looked like I would have to spend a lot of time moving to S4,
>> while staying with S3 would allow me to get the project done and get
>> results
>> going much faster (see Frank Harrell's comment in the thread above).

Bryan's original post started me thinking about this, but I didn't
respond. I'd classify myself as an 'S4' 'expert', with my ignorance of
S3 obvious from Duncan's corrections to my earlier post. It's hard for
me to make a comparative statement about S3 vs. S4, and hard really to
know what is 'hard' for someone new to S4, to R, to programming, ... I
would have classified most of the responses in that thread as coming
from 'S3' 'experts'.

>> As a concrete example (concrete for us non-programmers,
>> non-statisticians),
>> I recently decided that I wanted to add a descriptive piece of text to a
>> number of my plots, and it made sense to include the text with the
>> object.
>> So I just added a list element to the existing S3 object, e.g.
>> Myobject$descrip  No further work was necessary, I could use it right
>> away.
>> If instead, if I had made Myobject an S4 object, then I would have to go
>> back, redefine the object, update validObject, and possibly write some
>> new
>> accessor and definitely constructor functions.  At least, that's how I
>> understand the way one uses S4 classes.

This is a variant of Gabor's comment, I guess, that it's easy to modify
S3 on an as-needed basis. In S3, forgoing any pretext of 'best
practices', one might

s3 <- structure(list(x=1:10, y=10:1), class="MyS3Object")
## some lines of code...
if (aTest)
    s3$descraption <- "A description"

(either 'description' or 'discraption' is a typo, uncaught by S3).

In S4 I'd have to change my class definition from

setClass("MyS4Object", representation(x="numeric", y="numeric"))

to

setClass("MyS4Object", representation(x="numeric", y="numeric",
         description="character"))

but the body of the code would look surprising similar

s4 <- new("MyS4Object", x=1:10, y=10:1)
## some lines of code...
if (aTest)
    s4 at description <- "A description"

(no typo, because I'd have been told that the slot 'discraption' didn't
exist). In the S3 case the (implicit) class definition is a single line,
perhaps nested deep inside a function. In S4 the class definition is in
a single location.

Best practices might make me want to have a validity method (x and y the
same dimensions? 'description' of length 1?), to use a constructor and
accessors (to provide an abstraction to separate the interface from its
implementation), etc., but those issues are about best practices.

A downstream consequence is that s4 always has a 'description' slot
(perhaps initialized with an appropriate default in the 'prototype'
argument of setClass, but that's more advanced), whereas s3 only
sometimes has 'description'. So I'm forced to check
is.null(s3$description) whenever I'm expecting a character vector.

>      It doesn't stop there:  If you keep the same name for your
> redefined S4 class, I don't know what happens when you try to access
> stored objects of that class created before the change, but it might not
> be pretty.  If you give your redefined S4 class a different name, then

Actually, the old object is loaded in R. It is not valid
(validObject(originalS4) would complain about 'slots in class definition
not in object'). One might write an 'updateObject' generic and method
that detects and corrects this. This contrasts with S3, where there is
no knowing whether the object is consistent with the current (implicit)
class definition.

> you have a lot more code to change before you can use the redefined
> class like you want.

For slot addition, this is not true -- old code works fine. For slot
removal / renaming, this is analogous to S3 -- code needs reworking; use
of accessors might help isolate code using the class from the
implementation of the class.

A couple of comments on Duncan's

S3Foo <- function(x=numeric(), y=numeric()) {
  structure(list(x=as.numeric(x), y=as.numeric(y)), class="S3Foo")
}

I used makeS3Foo to emphasize that it was a constructor, but in my own
code I use S3Foo(). Realizing that, as Henrik has now also pointed out,
I'm far from perfect, the use of as.numeric() combines validity checking
and coercion, which I think is not usually a good thing (even when
efficient). In particular this

  as.numeric(factor(c("one", "two", "three")))

might unintentionally propagate earlier mistakes, e.g., after read.table
converts characters to factors behind the unexpecting user's back.

Martin


>      By contrast, with S3, if you have any code that tests the number of
> components in a list, that will have to be changed.
> 
>      Spencer
>> Back to trying to get something done!  Bryan
>> *************
>> Bryan Hanson
>> Professor of Chemistry & Biochemistry
>> DePauw University, Greencastle IN USA
>>
>>
>>
>>
>>
>> On 9/1/09 6:16 AM, "Duncan Murdoch" <murdoch at stats.uwo.ca> wrote:
>>
>>  
>>> Corrado wrote:
>>>    
>>>> Thanks Duncan, Spencer,
>>>>
>>>> To clarify, the situation is:
>>>>
>>>> 1) I have no reasons to choose S3 on S4 or vice versa, or any other
>>>> coding
>>>> convention
>>>> 2) Our group has not done any OO developing in R and I would be the
>>>> first, so
>>>> I can set up the standards
>>>> 3) I am starting from scratch with a new package, so I do not have
>>>> any code I
>>>> need to re-use.
>>>> 4) I am an R OO newbie, so whatever I can learn from the beginning
>>>> what is
>>>> better and good for me.
>>>>
>>>> So the questions would be two:
>>>>
>>>> 1) What coding style guide should we / I follow? Is the google style
>>>> guide
>>>> good, or is there something better / more prescriptive which makes our
>>>> research group life easier?
>>>>         
>>> I don't think I can answer that.  I'd recommend planning to spend some
>>> serious time on the decision, and then go by your personal impression.
>>> S4 is definitely harder to learn but richer, so don't make the decision
>>> too quickly.  Take a look at John Chamber's new book, try small projects
>>> in each style, etc.
>>>
>>>    
>>>> 2) What class type should I use? From what you two say, I should use S3
>>>> because is easier to use .... what are the disadvantages? Is there an
>>>> advantages / disadvantages table for S3 and S4 classes?
>>>>         
>>> S3 is much more limited than S4.  It dispatches on just one argument, S4
>>> can dispatch on several.  S3 allows you to declare things to be of a
>>> certain class with no checks that anything will actually work; S4 makes
>>> it easier to be sure that if you say something is of a certain class, it
>>> really is.  S4 hides more under the hood: if you understand how regular
>>> R functions work, learning S3 is easy, but there's still a lot to learn
>>> before you'll be able to use S4 properly.
>>>
>>> Duncan Murdoch
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>     
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>   
> 
>




More information about the R-help mailing list