[Bioc-devel] S4 initialize methods (was Re: "patches" for Gviz: utr plotting support and direct BamFile plotting)

Hahne, Florian florian.hahne at novartis.com
Thu Aug 23 08:47:25 CEST 2012

Uh, uh,
Seems like I am the bad guy still using the dreaded initialize methods
around here :-(
I do agree with most of what you guys say, but still want to put my two
cents here. My lawyers are preparing a more complete statement at this
point :-)

Having a constructor function to me somewhat implies that you want objects
from that class to be created by the user in a manual process. In more
complex class hierarchies you don't really want that, but rather you want
to pass through all the parent class' instantiations to fill the relevant
slots appropriately. Whenever you need something more complicated than
foo at a=b in these cases I do not see a way around the initializer. For
instance, in Gviz I have a whole bunch of classes that inherit from each
other, each of them grabbing the arguments to fill their slots while
objects are instantiated. The bottom-most of these classes will gobble up
all the arguments that are left over and stick them into a plotting
parameters object. I guess I could have explicit constructors for all of
those classes, and in those explicitly call the parent constructor, thus
walking through the hierarchy, doing whatever magic I need to do to make
things work. Now that doesn't strike me as particularly elegant either,
and I can't see how that would help with the code copying issue.

Another remark regarding validation. For classes with a large memory
footprint I am very much worried about unnecessary copies of the data. For
objects with very light content that are created very often however I care
much more about fast object instantiation. Running through a validation
method each time you create an object adds quite some overhead to this
(and we all know that building S4 methods even without validators is not
cheap at all). I remember there were times when the use of validation
methods for classes was not recommended. And personally I am no big fan of
them for the reasons pointed out by Kasper and Martin before.

That being said, I will take a closer look at my package to figure out a
way to code everything without the initialize methods and report back to
you guys about my success. I do not generally advertise the use use of
initializers (as a matter of fact I am far far away from that), I just
want to stand up here for these cuddly little creatures, threatened by
extinction and make the point that they still do have their rightful place
in our Bioconductor eco systemŠ


On 8/22/12 11:04 PM, "Kasper Daniel Hansen" <kasperdanielhansen at gmail.com>

>On Wed, Aug 22, 2012 at 4:47 PM, Steve Lianoglou <
>mailinglist.honeypot at gmail.com> wrote:
>> Hi,
>> And since we're already getting pretty deep into the woods, I guess it
>> can't hurt to keep going:
>> On Wed, Aug 22, 2012 at 4:11 PM, Kasper Daniel Hansen
>> <kasperdanielhansen at gmail.com> wrote:
>> > Martin has a great technical explanation.
>> >
>> > A briefer explanation is the following.
>> >
>> > 'We' used to use an initialize method to construct new objects, like
>> >   new("ExpressionSet", exprs = MATRIX)
>> >
>> > This paradigm is used in a number of packages, including Biobase.
>> >
>> > 'We' later realized - for reasons Martin explains below - that this is
>> prone
>> > to failure and should not be used.  However, you can still find tons
>> code
>> > using it - for legacy reasons.
>> >
>> > In general, you should not define the initialize method, you should
>>set a
>> > prototype when you define the class and you should write an explicit
>> > constructor, like
>> >   ExpressionSet <- function() {}
>> With the exception when your class has slots that are environments.
>> If it's true that some things will still call new("YourClass", ...)
>> that aren't your constructor, then you will be surprised:
>> setClass("A", representation=representation(cache="environment"),
>> prototype=prototype(cache=new.env()))
>> ctr <- function(cache=new.env()) new("A", cache=cache)
>> ## This is the behavior you probably expect:
>> a = ctr()
>> b = ctr()
>> a at cache[['a']] = 1
>> b at cache[['a']]
>> ## This isn't
>> y = new("A")
>> z = new("A")
>> y at cache[['a']] = 1
>> z at cache[['a']]
>> [1] 1    ## Woops!
>> But if you set an appropriate initialize method:
>> setMethod(initialize, "A", function(.Object, ..., cache=new.env()) {
>>   .Object at cache <- cache
>>   callNextMethod(.Object, ...)
>> })
>> All is well:
>> y = new("A")
>> z = new("A")
>> y at cache[['a']] = 1
>> z at cache[['a']]
>This is of course true, but there are also the many reasons Martin
>for why using the initialize method is not that great.  I mean, of course
>there are advantages to the initialize method - this is after all why it
>was used and recommended for years in Bioconductor.
>If you use a class that has an explicit constructor, you should use that
>constructor, period.  That you will mess up your objects by not doing
>ought to be self-evident.  This is kind of the same as the fact that for
>for many classes it is possible to mess them up by directly assigning
>things to their slots (remember that validObject is not always called all
>the time, per default).
>> I think ReferenceClasses replaces (most(?)) of the use cases that I
>> use the `cache` idiom for, although I'm not sure about the gotchas
>> with them because I haven't tried to grok RefClasses yet.
>> Still ... thought I'd point this out (I think it was actually one of
>> you two, who must have alerted me to this years ago (perhaps on
>> R-devel)).
>> -steve
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>  | Memorial Sloan-Kettering Cancer Center
>>  | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>	[[alternative HTML version deleted]]
>Bioc-devel at r-project.org mailing list

More information about the Bioc-devel mailing list