[R] S4 vs Reference Classes
Douglas Bates
bates at stat.wisc.edu
Tue Sep 13 23:24:45 CEST 2011
On Tue, Sep 13, 2011 at 12:54 PM, Joseph Park <jpark.us at att.net> wrote:
> Hi, I'm looking for some guidance on whether to use
> S4 or Reference Classes for an analysis application
> I'm developing.
> I'm a C++/Python developer, and like to 'think' in OOD.
> I started my app with S4, thinking that was the best
> set of OO features in R. However, it appears that one
> needs Reference Classes to allow object methods to assign
> values (other than the .Object in the initialize method)
> to slots of the object.
> This is typically what I prefer: creating an object, then
> operating on the object (reference) calling object methods
> to access/modify slots.
> So I'm wondering what (dis)advantages there are in
> developing with S4 vs Reference Classes.
> Things of interest:
> Performance (i.e. memory management)
> Integration compatibility with R packages
> ??? other issues
>From a C++/Python background you will probably feel more comfortable
with reference classes. They are newer than S4 classes and much newer
than S3 "classes" (which aren't really classes) and methods. Because
reference classes are newer the support for them has not been as fully
developed and you may encounter warts from time to time.
I use both reference classes and S4 classes. Often I have objects
that represent model/data combinations for which the parameter
estimates are to be determined by optimizing a criterion. In those
cases it makes sense to me to use reference classes because the state
of the object can be changed by a method. I want to update the
parameters in the object and evaluate the estimation criterion without
needing to copy the entire object. If you try to perform some kind of
update operation on an S4 object and not cheat in some way (i.e.
adhere to strict functional programming semantics) you need to create
a new instance of the object each time you update it. When the object
is potentially very large you find yourself worrying about memory
usage if you take that route. I found that my code started to look
pretty ugly because conceptually I was updating in place but the code
needs to be written as replacements.
Having said all that, you should realize that the style of programming
favored in R, and particularly in R packages, is to regard a method as
determined jointly by the generic function and the class(es) of the
argument(s). This is different from most other object-oriented
languages in which the class is paramount and a method is just a
member of a class that happens to be code, not data. You can get a
lot of mileage out of the idiom of defining methods for common
generics (print, plot, summary, ...) for particular S3 or S4 classes.
The structure of R packages favors S3 generics but you can define a
method for an S3 generic applied to an object from an S4 class. The
only restriction is that S3 generics can only dispatch on the first
argument but that is what happens in a language where the methods are
part of the class definitions. When you need multiple dispatch S4
generics and methods are worth the pain.
So my current approach is to use S4 classes for objects that are in
some way static but to use reference classes for objects that will
need to be updated when performing some kind of estimation (or other
such operations such as Markov chain Monte Carlo).
More information about the R-help
mailing list