[Rd] Objects in R

Duncan Murdoch murdoch at stats.uwo.ca
Sun Apr 24 08:54:44 CEST 2005


Byron Ellis wrote:
> 
> On Apr 21, 2005, at 9:37 AM, Nathan Whitehouse wrote:
> 
>>
>>
>>   (1)Novices simply don't understand it.  Students are
>> trained in "standard" object-oriented technique and
>> this wonkish offshoot(puritanical functional
>> programming) just increases the information costs to
>> using R and thus decreases the demand.
> 
> 
> The obvious solution here is to avoid the phrase "object oriented," 
> since that apparently means "acts like Java" these days.

That's actually a good suggestion.  In S4, objects don't own methods, 
generics do, so it might be more informative to call it "generic 
oriented".

>>   (2)Large frameworks benefit from
>> serializable/storable objects which contain both
>> functionality and modifiable values.  S4 stores
>> "class" information and R.oo does not upon
>> "save()"ing, but there are still real hindrances to
>> "trading" objects, which is -extraordinarily-
>> important in creating industrial-variety R-based
>> analysis.
>>   The classical example in my mind is the difficulties
>> in implementing a "visitor" pattern in S4.
> 
> 
> I'm not sure what your complaint about serialization is exactly, 
> serialization is just a way of storing data---its not like Java 
> serialization is actually putting *code* into data objects or anything, 
> so S4's method of saving objects is more or less equivalent. Yes, you 
> have to have the class definition around to get the object back in a 
> reasonable way, but this is true of any language.
> 
> As for the visitor pattern, the need to implement the visitor pattern is 
> actually a hack needed for single dispatch object systems to overcome 
> that limitation (typically implementing some sort of double dispatch 
> system). This is a classical example of why you really want S4's style 
> of OO and NOT GangOfFour style OO not the other way 'round. You simply 
> implement the methods directly with a signature of length 2 instead of a 
> signature of length 1.

I disagree with this.  Dispatch on multiple arguments is not 
inconsistent with Java-style OOP.  It would be called "function 
overloading".

I don't know if there are languages that do this (i.e. do function 
overloading based on run-time type, rather than declared type), but 
that's likely because I don't know a lot of languages, not because they 
don't exist.

The big advantage of Java-style OOP is that it allows a clear definition 
of what is needed in order to be a valid descendant class.  For example, 
if I want to be a descendant of a "vector", I would need to implement 
index lookup and assignment, a way to print when in a data.frame, etc.

With S4, it's not so clear what my class needs to do to be a vector. 
Suppose I call my class myVector, and get it working, submitted to CRAN, 
etc.

Independently, you create a new generic that is supposed to work on 
vectors.  You are unaware of my work, so you don't create a method for 
MyVector, and I'm unaware of your work so I don't create one either. 
When someone else tries to use both of our packages, they don't work 
together.

In Java-style OOP, on the other hand, you couldn't change the 
requirements for MyVector.  If it did the things that were required by 
the original class definition properly, then it would work with your 
code (since you couldn't require anything beyond the original 
definition).  If you needed methods not in the original, you would have 
to declare a new class, and there wouldn't be a risk of a third party 
getting burned by mixing my code with yours.


>>   (3)The absence of references means for large
>> datasets and long "analysis flows," there is (1)a
>> hideous amount of memory used storing each predecessor
>> analysis or (2)there are awkward "references" that
>> I've seen used like storing the name of the reference
>> object in a data slot.
>>   I find the use of environments in R.oo as opposed to
>> the glorified LISTSXP of S4 to be a satisfying way
>> around this.
> 
> 
> True, though this has little to do with objects per-se, it has to do 
> with memory management semantics that exist independently of the object 
> system. Frankly, for large datasets you really want to be doing your 
> analytics in some sort of database but weaning people from Excel has 
> proven to be even more daunting than convincing them that "object 
> oriented" is like "vehicular transportation"---it comes in many forms.
 >
>>   S4 is a nice step forward.  But R should be open to
>> further evolution.  The design choices for S4 and the
>> reasons behind abandoning OOP have never been
>> adequately justified in my knowledge.  Instead most
>> inquiries have been met by a Sphinx-like silence by
>> the core community.
> 
> 
> Abandoning OOP how? S4 is just as object oriented (more so) than S3 and 
> is certainly as object based as Java or C++. Sure it doesn't really act 
> like the Java/C++ style of OO, but to paraphrase the famous  Alan Kay 
> quote, "I coined the term object oriented and I sure wasn't thinking of 
> C++ when I did."

What S4 is missing is "encapsulation". Wikipedia's article on 
object-oriented programming gives a good definition:

"Encapsulation - Ensures that users of an object cannot change the 
internal state of the object in unexpected ways; only the object's own 
internal methods are allowed to access its state. Each object exposes an 
interface that specifies how other objects may interact with it."

Neither of these properties holds in S4.

Duncan Murdoch

>>   But the hindrances faced by our friend Ali are
>> common, and even in packages maintained by experienced
>> R developers, S4 is implemented shall we say curiously
>> as per the specs.
>>   Clearly OOP and R.oo are not the final answer.  But
>> some serious discussion about why packages like R.oo
>> which "layer" onto the standard functional R are
>> inappropriate is in order.
>>
>>   It would be great to see R emerge from its niche
>> audience.  I believe that would aid statisticians and
>> programmers.  However, a little bit more transparency
>> and something beyond a categorical "we just don't like
>> that way of doing things" would go a long way towards
>> growing the base community of R.
>>
>>   Cheers,
>>   Nathan Whitehouse
>>   Formerly of Baylor College of Medicine.
>>
>> Ali, maybe we R-core members are not decent enough.
>> But we strongly believe that we don't want to advocate
>> yet
>> another object system additionally to the S3 and S4
>> one,
>> and several of us have given talks and classes, even
>> written
>> books on how to do "decent" object oriented
>> programming
>> `just' with the S3 and/or S4 object system.
>>
>> No need of additional "oo" in our eyes.
>> Your main problem is that you assume what "oo" means
>> {which may
>> well be true} but *additionally* you also assume that
>> OO has to
>> be done in the same way you know it from Python, C++,
>> or Java..
>>
>> Since you are new, please try to learn the S4 way,
>> where methods belong to (generic) functions more than
>> to classes in some way, particularly if you compare
>> with other
>> OO systems where methods belong entirely to classes.
>> This is NOT true for R (and S-plus) and we don't want
>> this to
>> change {and yes, we do know about C++, Python,
>> Java,... and
>> their way to do OO}.
>>
>> Please also read in more details the good advice given
>> by Tony
>> Plate and Sean Davis.
>>
>> Martin Maechler,
>> ETH Zurich
>>
>>
>>
>> Nathan Whitehouse
>> nlwhitehouse at yahoo.com
>>
>> ______________________________________________
>> R-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> ---
> Byron Ellis (ellis at stat.harvard.edu)
> "Oook" -- The Librarian
> 
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list