[Bioc-devel] implementing interfaces

Martin Morgan mtmorgan at fhcrc.org
Wed Nov 26 17:04:04 CET 2008


Robert Gentleman <rgentlem at fhcrc.org> writes:

> Leaping in - where it might be best not to,
> the point that James is making is that, if we have a class, foo, on which we
> have defined a number of operations (eg exprs(x), pData(x) etc) that retrieve
> somewhat well defined entities, then an alternative to a class-centric method of
> dispatch is to rely on the accessors (what James is calling an interface).

I agree with the spirit of James' suggestion that programming to an
interface helps with abstraction away from underlying data. But also
with Sean that defining a class that has specific methods implemented
is susceptible to 'corruption' when developers add new methods for the
interface, or otherwise don't follow the implied convention. I thought
perhaps classes provide something more immutable, and that an
interface and its implementation might reasonably be represented by
classes, with appropriate validity methods / coercion etc to get
desired behavior. To that end...

I wrote the following interface and implementation API, which enforces
(not completely at the moment, could be more so) some of these ideas:

> source("Interface.R")

> ## define an interface
> setInterface("AnInterface",
+              foo=function(x, ...) stop("not implemented"),
+              bar=function(x, y=1, ...) stop("not implemented"))

> ## create a class and (partially) implement the interface
> setClass("A", representation=representation(x="numeric"))
[1] "A"

> setImplementation("A", "AnInterface", foo=function(x, ...) {
+     sum(slot(x, "x"))
+ })

> getImplementation("A", "AnInterface")
class: implements 'AnInterface' of class 'A' 
interface:
  foo(x=, ...=)
  bar(x=, y=1, ...=)

> setGeneric("baz", function(intrfc, ...) standardGeneric("baz"))
[1] "baz"

> ## dispatch on the interface
> setMethod("baz", "AnInterface", function(intrfc, ...) {
+     ## use the implementation
+     impl <- getImplementation(iintrfc, "AnInterface")
+     impl$foo(intrfc)
+ })
[1] "baz"

> a <- new("A", x=1:5)

> baz(a)
[1] 15

I don't think the details are necessarily correct, and there is still
'convention' required, in the form of explicitly obtaining and using
the implementation. But maybe this is a bit more robust than relying
on methods alone.

The attached Interface.R is the defintion, Interface_test.R the above
script.

Martin

> In that case, there is not a lot to be gained by defining an interface class,
> one could just have a regular function (and indeed code reuse could be obtained
> by having the methods just be interfaces to that function).
>
> I disagree a little with the characterization that it is a more developer
> friendly paradigm -- that depends on which developer we are talking about. The
> one doing the original implementation, or the one that wants to add to it.  I
> think that the approach James suggests makes more work initially, and less later
> on.  But in any event, it is a good thing for folks to think about.
>
> The good thing, is that if things are as James suggests, I don't think he needs
> to write a very big patch to get the behavior he wants :-). And I at least don't
> see the discussion as complaining,
>
>  best wishes
>    Robert
>
>
> Sean Davis wrote:
>> On Tue, Nov 25, 2008 at 9:04 PM, James Bullard <bullard at berkeley.edu> wrote:
>> 
>>> On Nov 25, 2008, at 5:27 PM, Sean Davis wrote:
>>>
>>>
>>>> On Tue, Nov 25, 2008 at 7:22 PM, James Bullard <bullard at berkeley.edu>
>>>> wrote:
>>>> Thanks all for the comments. I somehow never happened on
>>>> showMethods(class="AffyBatch"). Comments below.
>>>>
>>>>
>>>> On Nov 25, 2008, at 1:08 PM, Robert Gentleman wrote:
>>>>
>>>> Hi,
>>>>
>>>> Martin Morgan wrote:
>>>> James Bullard wrote:
>>>> hi all, this will probably demonstrate my lack of knowledge concerning
>>>> OOP in R, but I am hoping for some quick answers. This is a problem I
>>>> have faced before.
>>>>
>>>> I want to use the method bg.correct.mas, this method takes as its
>>>> object an AffyBatch. I don't have an AffyBatch nor do I want to
>>>> massage my data structures into such an object, so I want to implement
>>>> the AffyBatch interface. However, I can see no way to determine the
>>>> list of generics which have methods defined on AffyBatch (and
>>>> superclasses). I understand that things are method-centric, however I
>>>> assume that being method-centric still leaves room for a way to know
>>>> the methods specialized for a class/interface so that I as a
>>>> programmer can define the suitable methods on a new class without
>>>> having to dig around all over the place determining what I need to
>>>> define. My question is:
>>>>
>>>> is there a function to determine all the methods that are specialized
>>>> for a certain class? Also, would it be possible to write a function
>>>> adheresTo(classA, classB), which tells me that classB satisfies the
>>>> calling requirements of classA (forget for a moment that we have
>>>> public member variables).
>>>>
>>>> note, i don't want to make a subclass of AffyBatch.
>>>>
>>>>
>>>> showMethods(class="AffyBatch")
>>>>
>>>> gets you some of the way there. But in a brand spanking new session it
>>>> shows nothing (because the packages where methods are defined, e.g.,
>>>> affy, has not been loaded) and this illustrates a fundamental problem:
>>>> the interface to AffyBatch is dynamically determined by loaded packages.
>>>>
>>>> Likely you'd invoke
>>>>
>>>> showMethods(class="AffyBatch", where=getNamespace("affy")
>>>>
>>>> to get a kind of base-line set of expected methods, i.e., those visible
>>>> from affy.
>>>>
>>>>  But this all seems like a weird way to go.  I guess I don't really see
>>>> the use
>>>> case for:
>>>>  I know this class (AffyBatch), and I want to find all generics that have
>>>> methods for it, so I can implement my own class and the same set of
>>>> methods.
>>>>
>>>>
>>>>  Why not figure out what operations you think you want to support, and
>>>> support
>>>> those? What is so special about the set of generic functions that have
>>>> AffyBatch
>>>> methods that you would want to support those?
>>>>
>>>> Nothing is special about the set of generic functions that have AffyBatch
>>>> methods except that the bg.correct.mas is defined to take an AffyBatch
>>>> object and if I want to call this function I have to subclass AffyBatch or
>>>> construct an AffyBatch. I don't have a CDF file, but I do have a notion of
>>>> PM/MM probes. The real issue is that methods which are defined on classes
>>>> are often unusable when you can't construct the object. The ideal way to do
>>>> this would be (in my opinion)
>>>>
>>>> setMethod("indexProbes", signature(object = "AffyInterface", which =
>>>> "character"), function(object, which, ...) {
>>>>       stop("No suitable method defined.")
>>>> })
>>>>
>>>> setMethod("bg.correct.mas", signature(object = "AffyInterface"),
>>>> function(object, ...) {
>>>>       # current defintion goes through.
>>>> })
>>>>
>>>> Where AffyInterface is a virtual class (also no member variables) but has
>>>> a set of methods associated with it. This is a much more "developer"
>>>> friendly way to do it because then I don't have to concern myself with the
>>>> internals of AffyBatch, just the interface that I need to implement to call
>>>> a particular generic that I am interested in. I then would subclass the
>>>> AffyInterface class which would impose no stricture on how I represent my
>>>> class members, it would only (although not explicitly) proclaim that I
>>>> implement the set of methods in AffyInterface.
>>>>
>>>> As Martin and I pointed out, perhaps in less than transparent words, there
>>>> is no static concept of an interface to implement.  The "interface" in
>>>> defined dynamically based on what methods are available in the search path.
>>>> If there is a particular method for an AffyBatch that you want to
>>>> implement for your class, you are free to do so by simply creating that
>>>> method for your class.  If you think about it, what you are proposing is to
>>>> do exactly that, since the methods really do need to access the internals of
>>>> the class in order to operate.
>>>>
>>> I think you are not understanding. What you say above is incorrect
>>> ("methods really do need to access the internals of the class");
>>> bg.correct.mas is a perfect example. If it *really* needed to access the
>>> internals of a class then you would have a lot of @s in the code, but you
>>> have none. This is because you can abstract the concepts of bg.correct.mas
>>> away from a particular representation of the data. This is what I am trying
>>> to drive home.
>>>
>>>
>> The reason that there are no @s in bg.correct.mas is that other methods are
>> used (notice that indexProbes() and intensity() are both methods).  Of
>> course, those other methods (or methods they are built on) need to access
>> the internals.  If you want to have methods that abstract out the @s, you
>> have to write them for your class.  Alternatively, you need to make your
>> object look like an AffyBatch so that you benefit from its methods via
>> inheritance or directly.
>> 
>> Sean
>> 
>> 
>>>
>>>  The class has no methods "built into" it like other languages where the
>>>> methods belong to the class directly.
>>>>
>>> This is not at issue here. If you have an API which is object oriented then
>>> what is the right way to expose the functionality of that library? If you
>>> tie the logic methods into a particular representation then the only thing
>>> you gain from object orientation is encapsulation, but you have lost the
>>> benefit of dispatch because it is *too* hard to subclass when
>>> representations become complicated, in other words you tie your contract
>>> (the set of methods that something needs to implement) to an implementation
>>> -- this might not be a bad way to design code for an in-house project, but
>>> it is a bad way to make/expose libraries to other programmers.
>>>
>>> Also, I do understand that I could use setMethod downstream on classes thus
>>> making the interface a moving target, but I think that is irrelevant here.
>>> The core classes and generics of bioconductor could be explicitly defined at
>>> release time and I could depend on a set of functionality exposed. Also, I
>>> understand that some of the particulars of R make this a little challenging,
>>> but I do want to stress that I think it is an error to assume that I somehow
>>> need to have a class to gain access to functionality -- that is an
>>> implementation issue.
>>>
>>> thanks, jim
>>>
>>>
>>>  As Robert was pointing out, the simplest way to reuse the code written for
>>>> AffyBatch is to make your object look like an AffyBatch.  You can, of
>>>> course, rewrite all the methods for your own data structure which might be
>>>> more efficient if you have only one or two methods that you want to
>>>> implement.
>>>>
>>>> Hope that helps a bit.
>>>>
>>>> Sean
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  And then, typically the simplest way to get there is to define a coerce
>>>> method.  I know you say you don't want to do that, but basically, if the
>>>> class
>>>> (AffyBatch) is widely used (I mean here that lots of methods are defined
>>>> for it,
>>>> not that people use it) that exercise its components you will need to have
>>>> a
>>>> class that is basically isometric to it, and then why would you want the
>>>> additional code base that comes with having your own methods?
>>>>
>>>>
>>>> I agree that I basically need a class which is isometric to AffyBatch and
>>>> that is what I believe the problem to be. I cannot get at some functionality
>>>> because in a practical sense the generics have been tied to a class. There
>>>> is only one way to use the functionality of bg.correct.mas and that is be an
>>>> AffyBatch. I think that this is fine when it is strictly necessary or when
>>>> you are designing end-user code, but if you want programmers to build on top
>>>> of things then it is limiting.
>>>>
>>>> So I know that affy is not part of Biobase and I can solve the problem
>>>> without too much trouble in a number of ways and that the primary goal of
>>>> the package is for the end user, but I guess I am advocating for a design by
>>>> interface approach so that particular choice of representation is left to
>>>> the programmer. Thanks for the feedback and please understand that I
>>>> appreciate all of the hardwork even though i am complaining a bit.
>>>>
>>>>
>>>> jim
>>>>
>>>>
>>>>
>>>> AshowMethods returns a connection (!) which is basically useless for
>>>> programmatic purposes, e.g., adheresTo().
>>>>
>>>>  true, but you can get a slightly (only slightly) more helpful answer if
>>>> you
>>>> set printTo=FALSE (in a recent revision of R 2.9.0 candidate)
>>>>
>>>>  Robert
>>>>
>>>>
>>>> Martin
>>>>
>>>>
>>>> thanks, jim
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>>
>>>> --
>>>> Robert Gentleman, PhD
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M2-B876
>>>> PO Box 19024
>>>> Seattle, Washington 98109-1024
>>>> 206-667-7700
>>>> rgentlem at fhcrc.org
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>> _______________________________________________
>>> Bioc-devel at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 
>
> -- 
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioc-devel mailing list