[Bioc-devel] S4 initialize methods (was Re: "patches" for Gviz: utr plotting support and direct BamFile plotting)

Martin Morgan mtmorgan at fhcrc.org
Wed Aug 22 21:37:44 CEST 2012

Steve and I exchanged a little email about S4 initialize methods that it 
might help to share. Steve created an initialization method

setMethod("initialize", "BamTrack",
function(.Object, bam, cache=new.env(), range.strict=FALSE, ...) {
   if (missing(bam) || !is(bam, "BamFile") || !file.exists(path(bam))) {
     stop("bam required during initialize,BamTrack")
   cache$bam <- bam
   .Object at cache <- cache
   callNextMethod(.Object=.Object, ...)

Steve tells me that this is similar to Gviz coding style. I think there 
are several issues here.

The first is that creating a sub-class actually tries to create an 
instance of the parent class, and to do that new("BamTrack") has to 
succeed. It doesn't because 'bam' does not have a default value

 > setClass("BamSubtrack", contains="BamTrack")
Error in .local(.Object, ...) : bam required during initialize,BamTrack

A second issue that comes up involves validation, which is what the 
check for a missing(bam) etc., is. It makes more sense to place this in 
the object validity method so that the code can be re-used, perhaps 
providing a prototype to initialize the bam field properly.

While on validity and prototypes, a weird thing is that the class 
definition can specify an invalid prototype and, since validity is only 
checked if the user provides additional arguments to 'new' / 
'initialize', it's possible to create invalid objects

   setClass("A", representation(x="numeric"))
   setValidity("A", function(object) {
       if (length(object at x) != 1L) "'x' must be length 1" else NULL

and then

 > a = new("A")

seems to work but

 > validObject(a)
Error in validObject(a) : invalid class "A" object: 'x' must be length 1

the solution is to provide a prototype that creates a valid object

   setClass("A", representation(x="numeric"), prototype=prototype(x=1))

and the acid test is validObject(new("A")) == TRUE

A third and even more obscure issue is that 'initialize' is advertised 
to take unnamed arguments as instances of parent classes that are used 
to initialize derived classes, so it makes sense to avoid accidentally 
capturing un-named arguments by placing 'bam' and friends _after_ ... 
Let's see...

   setClass("A", representation(x="numeric"))
   setClass("B", representation(y="numeric"), contains="A")

and then

 >   new("B", new("A", x=2))
An object of class "B"
Slot "y":

Slot "x":
[1] 2


   setMethod(initialize, "B", function(.Object, y, ...)
       callNextMethod(.Object, y=y, ...))

and now the copy constructor is broken.

 >   new("B", new("A", x=2))
Error in validObject(.Object) :
   invalid class "B" object: invalid object for slot "y" in class "B": 
got class "A", should be or extend class "numeric"

Another point about initialize as a copy constructor is that it updates 
multiple slots in a (relatively) efficient way -- only 1 copy of the 
object, rather than once for each slot assignment

   removeMethod("initialize", "B")

and then

 > b = new("B")
 > tracemem(b)
[1] "<0x53d74e0>"
 > b at x = 1
tracemem[0x53d74e0 -> 0x54096c8]:
 > b at y = 2
tracemem[0x54096c8 -> 0x540b0c0]:

so a copy on each slot assignment, vs.

 > b1 = new("B"); tracemem(b1)
[1] "<0x540e968>"
 > initialize(b1, x=1, y=2)
tracemem[0x540e968 -> 0x541c628]: initialize initialize
An object of class "B"
Slot "y":
[1] 2

Slot "x":
[1] 1

Combined, these are enough to make one want to think very carefully 
about writing initialize methods; often a 'Constructor' is the right 
place to do argument coercion, etc., (although sometimes I think the 
constructor is avoiding some of its responsibility, e.g., BamFile() 
fails, but validObject(new("BamFile")) == TRUE) and validity methods the 
correct place to check validity.


On 08/21/2012 11:12 PM, Steve Lianoglou wrote:
> Hi Florian (and other interested Gviz'ers),
> I thought I'd use Gviz to whip up pretty plots for my thesis (yay!)
> where I need to plot lots of NGS data over 3'UTRs.
> I wanted to tackle the "more standard" drawing of UTRs (thin exons
> (vs. thick coding)) in gene regions as well as making repeated
> plotting of the same data over different regions easier -- there is
> also an unplanned for increase in plotting speed of ~ 5-8x
> (unscientific benchmark) when plotting gene regions using my TxDbTrack
> vs. GeneRegionTrack.
> I have a more thorough summary of what I did here:
> http://cbio.mskcc.org/~lianos/files/bioc/Gviz/Gviz-enhancement-1.html
> With the relevant pics at the bottom. It's still a work in progress
> but I thought I'd put it out there now to see if you think it'd be
> useful for patching back into Gviz -- I'd be happy to groom things
> further to make it easier to add back into Gviz, or change things to
> make the approach more "inline" with the coding style/philosophy of
> the package (which I tried to stick to).
> Thanks (again) for this package -- it's really great.
> -steve

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

More information about the Bioc-devel mailing list