[Bioc-devel] Need for BiocGenerics?
Martin Morgan
mtmorgan at fhcrc.org
Tue Sep 6 19:10:01 CEST 2011
On 09/05/2011 08:44 PM, Steve Lianoglou wrote:
> Hi Kasper and Martin,
>
> As for Kasper's suggestion:
>
>> Now, one solution for Bioconductor, is to have a single package just
>> containing standard generics for the project. This used to be
>> Biobase, so that all packages using some basic S4 depended on Biobase.
>> Now with IRanges etc. and the many additions this is not true
>> anymore. One possible "solution" going forward is to get some
>> consensus about the signature and then design a single base
>> Bioconductor containing simply a lot of calls to setGeneric. Of
>> course, this will not necessarily help your use case a lot, but it
>> might at least provide more sanity with Bioc.
>
> I can see the value in that as I've found some functions are worth
> having S4-ized off the bat and don't necessarily need all (any -- at
> times) of the other functionality that the packages they are defined
> in provide.
>
> For instance, things like:
>
> genome (setGeneric defined in rtracklayer)
> strand (from GenomicRanges)
>
> But this seems like a discussion that is a whole different can of
> worms and definitely worthy of its own thread. The examples I provide
> aren't all that great either since you'll likely be depending on these
> packages already if you know what's good for you ;-)
OK, to peek in to that can of worms...
Seems like there are 'domain specific' generics like strand in the
sequencing domain, exprs in the expression array domain. These seem to
get handled by their definition in packages at an appropriate level in
(our) package hierarchy for handling corresponding types (GenomicRanges,
Biobase). Especially w/ GenomicRanges there could be an argument that
one wants a lighter-weight commitment; I don't think you'd want a
package w/ exprs and strand both defined (?).
And then there are generics that R should really be taking care of,
e.g., those 'mask'ed by IRanges:
cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
pmin, pmin.int, rbind, rep.int, setdiff, table, union
as well as a small number of generics (maybe the only example is
updateObject) that are currently created in both Biobase and IRanges.
Perhaps there is a role for a BiocGenerics package? As Steve points out,
this wouldn't help in his case, but would help avoid conflicts across
Bioc packages.
This also opens the door to a BiocClasses package, which is a theme that
came up during Developer Day at BioC 2011 -- what are the classes that
one is supposed to reuse? This is much trickier. For instance, I
personally like the SimpleList hierarchy in IRanges. It provides a
consistent way to annotate elements (via elementMetadata()) and objects
(metadata()) and in particular I think that IRanges::DataFrame is more
generally useful and better implemented than
Biobase::AnnotatedDataFrame, but it is not without it's limitations
(it's not a drop-in replacement for data.frame, e.g., in xtabs() or lm()).
Comments welcome.
Martin
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-devel
mailing list