[Bioc-devel] prada package and eSet in Biobase

Seth Falcon sfalcon at fhcrc.org
Thu Apr 13 23:47:57 CEST 2006


Florian Hahne <f.hahne at dkfz.de> writes:
> thanks for the extensive essay about eSets...
> I totaly agree with the concept of the virtual eSet class and indeed a
> MultiExpressionSet subclass would be contrary to all the ideas behind
> it. But as you know, people are lazy and always try to take the path
> of least resistance...

I think you are saying that you are "happy" with the latest changes.
If not, and you have a use case for a MultiExpressionSet type class,
please give us some details.

> Regarding discipline: It isn't that much work to program this stuff
> and Martin's code stubs are already very helpful. Maybe two or three
> more sentences what the function calls are doing and it can already be
> put into a vignette. If you are interested I can try to condense the
> messages of this thread into a vignette once all the infrastructural
> stuff is realy finished. 

Sure!  Contributions of such vignettes are certainly welcome.

> But there still remain a couple of open questions:
>
> 1.) I prefer to set the validation function already with the class
>   definition. Is there a reason I should not do that and use
>   setValidity instead?

Not that I know of.  I agree with you that setting validity in the
class definition keeps things all in one place and is to be preferred
(unless someone can tell us why using setValidity is better).

> 2.) Does it make sense to program the constructor/initialzer in a way
>   that one has to give all the items of assayData individualy?
>   Wouldn't it be better to accept a predefined assayData object? Or
>   should one just do something like this:
>
> obj  <- new("mytestSet")
> ad <- assayDataNew(storage.mode="list",
>                                   exprs = matrix(1:9, ncol=3),
>                                   other = matrix(1:9, ncol=3),
>                                   stuff = matrix(1:9, ncol=3))
> assayData(obj) <- ad

To answer this, I we need to discuss the two important use cases that
we are tyring to address...

Case 1: Fixed Components
------------------------

The data is organized into a (small) number of fixed components.
Examples:

* Expression data has exprs, se.exprs

* Two-color data has R, G, Rb, Gb

Case 2: Multiple Non-Fixed Components
-------------------------------------

[I'm less familiar with this use case, but I think it is the one you
have, so perhaps you can help me explain it.]

The data has a variable number of components.  Fixed names do not make
sense here.  Examples:

* Matrices of numbers, one for each 96-well plate in some type of
  experiment.  Plates have ids perhaps, but these change from one
  instance to another.  I _think_ that one assumes that each matrix is
  the same measurement but for different "plates".

For the Fixed Components case, I think there is a clear advantage to
customizing the initialize method such that users can build complete
and correct instances easily.  Having to specify assayData contents
separately seems convoluted and error prone; you won't get an error
message until the eSet subclass' validity check is run.  

For the Multiple Non-Fixed Components case, one doesn't want to have
to name each element of assayData in the call to new().  Instead, one
could do as you propose, creating an AssayData instance separately.
However, as a user, I might appreciate a custom initialize method that
allowed me to provide a list of matrices so that I don't have to know
as much about the internals.

Thoughts?  Help clarifying use case #2 (or other important use
cases?).

> 3.) Shouldn't all of eSet's replacement methods call validObject
>   before they finish? I always find it a waste of time writing
>   validator functions and then by doing a stupid replacement (like
>   the one above) create invalid objects without even noticing.

This is certainly up for discussion.  Validation is an expensive
(computation time) operation so we need to choose carefully where/when
we trigger it.

But validating at least in a replacement method on the assayData slot
makes a fair bit of sense to me and could be worth a try.

> Of course eventualy it will be necessary to overwrite the replacement
> methods of eSet if your class structure gets sophisticated but for the
> day to day usage most people will probably rely on the methods
> provided by eSet.
>
> Hope these questions are not completely stupid but that's what I
> noticed while playing around with the new stuff.

Excellent feedback.  The question are quite far from stupid.  I hope
other interested folks will chime in with their questions and
comments.

Best,

+ seth



More information about the Bioc-devel mailing list