[Rd] S3 vs S4 for a simple package

Mon Jan 7 22:50:24 CET 2008

On Jan 7, 2008 1:34 PM, John Chambers <jmc at r-project.org> wrote:
> Prof Brian Ripley wrote:
> > On Mon, 7 Jan 2008, Robin Hankin wrote:
> >
> >
> >> I am writing a package and need to decide whether to use S3 or S4.
> >>
> >> I have a single class, "multipol"; this needs methods for "[" and "[<-"
> >> and I also need a print (or show) method and methods for arithmetic +-
> >> */^.
> >>
> >> In S4, an object of class "multipol" has one slot that holds an array.
> >>
> >> Objects of class "multipol" require specific arithmetic operations;
> >> a,b being
> >> multipols means that a+b and a*b are defined in peculiar ways
> >> that make sense in the context of the package. I can also add and
> >> multiply
> >> by scalars (vectors of length one).
> >>
> One thing you cannot do in S3 is to have methods that depend on anything
> but the first argument.  Do you want something sensible for 1 + a when
> a is a "multipol"? The default call to the primitive version may or may
> not give you what you want.

I agree with John.  If method dispatch on multiple arguments is
potentially useful to you, and it sounds as if it is, then S4 is
clearly the winner.  In the Matrix package, for example, determining a
good method for evaluating A %*% B or solve(A, B), where A and B are
Matrix objects, or matrix objects would be extremely difficult if we
were not using S4 classes and methods.

> >> My impression is that S3 is perfectly adequate for this task, although
> >> I've not yet finalized the coding.
> >>
> >> S4 seems to be "overkill" for such a simple system.
> >>
> >> Can anyone give me some motivation for persisting with S4?
> >>
> >> Or indeed reassure me that S3 is a good design decision?
> >>
> >
> > Does performance matter?: S4 dispatch is many times slower than S3
> > dispatch for such functions. (It is several times slower in general, but
> > the difference is particularly marked for primitives.)
> >
> Well, the question is whether performance of _method dispatch_ matters,
> which it tends not to in many cases. And it would be good to have some
> data to clarify "many times slower".  Generally, looking up inherited
> methods is likely to take a while, but only the first time does R code
> need to work out the inheritance. On repeated calls with the same
> signature, dispatch should be basically a lookup in an environment.

I can imagine circumstances in which the time spent on method dispatch
would be important but I haven't encountered them and we have been
working with S4 classes and methods in the Matrix and lme4 packages
for a long time now.  Much more important to me is the amount of time
that I spend designing and debugging code and that is where S4 is, for
me, the clear winner.  Because of the formal class definitions and the
validation of objects  I can write C code assuming that the contents
of particular slots in an object have the desired class.  When I end
up dealing with S3 objects not S4 objects I find it very clunky
because there is no formal class definition and, if I am to play it
safe, I should always check for the existence of certain components in
the object then check or coerce the type then check the length, etc.
With S4 classes I only need to do that once in a validate method.

With S4 I feel that I can use a single class system in my R code and
in the underlying C code.  I don't need to create exotic structs or
C++ classes in the compiled code because the S4 class system in R is
enforcing the class definition for me.

I just read Tim Hesterberg's message from later in this thread and I
agree with him that the formal definition of classes makes it awkward
to revise the slots.  Fortunately, I have a very understanding user
base for the lme4 package and they tolerate my changing class
definitions between releases because I manage to convince them somehow
that the package is improved by the changes.