[Bioc-devel] Need for BiocGenerics?

Martin Morgan mtmorgan at fhcrc.org
Tue Sep 6 19:10:01 CEST 2011

On 09/05/2011 08:44 PM, Steve Lianoglou wrote:
 > Hi Kasper and Martin,
 > As for Kasper's suggestion:
 >> Now, one solution for Bioconductor, is to have a single package just
 >> containing standard generics for the project.  This used to be
 >> Biobase, so that all packages using some basic S4 depended on Biobase.
 >>   Now with IRanges etc. and the many additions this is not true
 >> anymore.  One possible "solution" going forward is to get some
 >> consensus about the signature and then design a single base
 >> Bioconductor containing simply a lot of calls to setGeneric.  Of
 >> course, this will not necessarily help your use case a lot, but it
 >> might at least provide more sanity with Bioc.
 > I can see the value in that as I've found some functions are worth
 > having S4-ized off the bat and don't necessarily need all (any -- at
 > times) of the other functionality that the packages they are defined
 > in provide.
 > For instance, things like:
 > genome (setGeneric defined in rtracklayer)
 > strand (from GenomicRanges)
 > But this seems like a discussion that is a whole different can of
 > worms and definitely worthy of its own thread. The examples I provide
 > aren't all that great either since you'll likely be depending on these
 > packages already if you know what's good for you ;-)

OK, to peek in to that can of worms...

Seems like there are 'domain specific' generics like strand in the 
sequencing domain, exprs in the expression array domain. These seem to 
get handled by their definition in packages at an appropriate level in 
(our) package hierarchy for handling corresponding types (GenomicRanges, 
Biobase). Especially w/ GenomicRanges there could be an argument that 
one wants a lighter-weight commitment; I don't think you'd want a 
package w/ exprs and strand both defined (?).

And then there are generics that R should really be taking care of, 
e.g., those 'mask'ed by IRanges:

     cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
     pmin, pmin.int, rbind, rep.int, setdiff, table, union

as well as a small number of generics (maybe the only example is 
updateObject) that are currently created in both Biobase and IRanges. 
Perhaps there is a role for a BiocGenerics package? As Steve points out, 
this wouldn't help in his case, but would help avoid conflicts across 
Bioc packages.

This also opens the door to a BiocClasses package, which is a theme that 
came up during Developer Day at BioC 2011 -- what are the classes that 
one is supposed to reuse? This is much trickier. For instance, I 
personally like the SimpleList hierarchy in IRanges. It provides a 
consistent way to annotate elements (via elementMetadata()) and objects 
(metadata()) and in particular I think that IRanges::DataFrame is more 
generally useful and better implemented than 
Biobase::AnnotatedDataFrame, but it is not without it's limitations 
(it's not a drop-in replacement for data.frame, e.g., in xtabs() or lm()).

Comments welcome.

Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

More information about the Bioc-devel mailing list