[Bioc-devel] prada package and eSet in Biobase

Seth Falcon sfalcon at fhcrc.org
Thu Apr 13 16:51:37 CEST 2006


Hi Florian, all,

Florian Hahne <fhahne at gmx.de> writes:
> thanks, I've already seen it on the biocDevel mailinglist. The bad
> thing is that I changed my code two days ago to make it work with
> your MultiExpressionSet subclass but now this has been kicked out
> again. So I guess I will build my own subclass instead, bad luck...

We are setting a terrible example.  We are aware of this, we aim to
improve in the future, and we beg forgiveness :-)  I have hope that
the refactoring of the core classes is going to make it easier to
develop interoperable software once we get over the initial pain of
changing over.

> Although I think that this would be a useful thing to be provided by
> Biobase, there seem to be many use cases for this and I don't see the
> benefit in implementing it over and over again. I do see that having
> such a general subclass makes validation of the instantiated objects
> extremely hard. So what were the considerations for dropping
> MultiExpressionSet from Biobase?

Florian, you've cut to the heart of the issue.  Let me explain some of
the thinking (sorry for the length).

The idea of a MultiExpressionSet is that it (1) ties together
expression and sample data, and (2) allows arbitrary "stuff" to be
stored in the assayData slot (storage is list or environment).

As you say, there seem to be many uses cases for this sort of
structure.  Let's take the example Martin provided of the Swirl data.
So Martin whats to put four named elements (R, G, Rb, Gb) into the
assayData slot.  He could use a MultiExpressionSet for this (if avail)
and then write some methods for MultiExpressionSet to manipulate his
data and access the R, G, Rb, Gb pieces in assayData.  The problem is,
there is nothing to guarantee that a particular MultiExpressionSet
instance has the right stuff inside the assayData slot.  If you are
good friends with Martin, he might tell you how to create his kind of
MultiExpressionSet so that you can use the methods he has written.
And Martin's methods could be written to fail gracefully, but this
would require a lot of manual checking.  Summary: this approach makes
it harder to end up with reusable and interoperable software.

Instead, we propose that users like Martin create a subclass of eSet
with enough supporting infrastructure (initialize method, validity
checking) to ensure the proper structure is available.  This way, you
don't have to know Martin very well (you should, he's a great guy), he
has put everything you need to know into the class definition.  You
can reuse his code.

That is the core of it.  We want to avoid having lots of
MultiExpressionSet instances that have their own "special" structure
not defined or managed in some way by the class system.

Let me explain a bit further...  If we took an entirely object
oriented approach, we would toss out assayData and require that
subclasses add as a slot everything they need.  For example,
ExpressionSet would have an exprs slot and Martin's swirl example
would have slots for R, G, Rb, and Gb.  This approach has advantages:
it more strictly defines the data and allows the class system to do
more of the work for you.  However, it lacks flexibility.  If you end
up with a number of slots that are only present some of the time, for
example, the slot-based approach can be more trouble than it is
worth.  The current design is aiming for a compromise.  It allows
developers to control what goes in assayData, but _encourages_ the
creation of subclasses that codifies known structure.  Note also, that
the current approach does not prevent subclasses from taking a more
structured and slot-based approach.  Add as many slots as you like and
add methods for accessing them.  You could even ignore the assayData
slot entirely, but you will need to override the assayData accessor
methods to do the Right Thing for your particular class.

Finally, if you are still reading, I would be surprised if your use of
MultiExpressionSet wouldn't, in the end want to add at least one slot
of its own in which case you would be subclassing anyway.



More information about the Bioc-devel mailing list