[Bioc-devel] Couple of eSet questions

Vincent Carey 525-2265 stvjc at channing.harvard.edu
Fri Feb 3 22:05:27 CET 2006

> On  3 Feb 2006, stvjc at channing.harvard.edu wrote:
> > it sounds to me as if we are finding some use cases, and that
> > we want to extend eSet to cope with those that have well-defined
> > requirements.  all we know about assayData at present is that
> > it is either a list or an environment.  for twochannel data
> > we may want to define a specific extension of listOrEnv (i think
> > this is possible) that has guaranteed structure and names.
> > for data with standard errors we might have to do likewise.
> >
> > but i would say that part of the intention of the eSet is to
> > require the developer always to allow an environment representation
> > for the assayData.  the validity criteria can impose restrictions
> > on this environment.
> Respectfully, I think I disagree.  I would like to have the use cases
> drive the design of specific subclasses of an eSet-like class where
> the structure is expressed as part of the class definition
> (e.g. exprSet should have an exprs slot, not a named element of an
> env).

I think the use case should be used to describe what behaviors
we want and then the representation can be chosen to allow
those behaviors.

The internal representation should be subordinate
to the methods that expose the structure through its behaviors.
Now perhaps my comment on requiring the developer to allow for
an environment is inconsistent with this position.

> Pushing the definition of the structure to the validity function makes
> the actual structure harder to see (IMO) and I'm concerned that it
> will make extensions which otherwise would be trivial subclasses,
> tricky.  I guess a part of my objection is that it feels as though we
> will be implementing our own mini class system where slots are the
> named elements of an env.

I am open to discussion of how the visibility of a structure needs
to be cared for in our project.  If it is only visible through
methods, it shouldn't matter whether the information components are
slots or environment elements.  And the class designers should
be free to change the internal representations without downstream
consequences.  We have not achieved this in several domains --
should we be trying harder?

> A compelling argument for forcing the actual data to be in an
> environment is to avoid copying.

That's the intention -- but not to force people to use environments,
but to allow them to do so when it makes sense to do so.

> > eSet itself does not need to solve the two channel or
> > error-available problems at once.  it should be extended to do so,
> > with explicit use cases stated.
> Yes, I'm just not convinced that eSet has any business having actual
> data slots; those are the domain of its subclasses.

It would be good to get some agreement on this.  I would say that
the eSet schematizes high throughput assay data.  We hold, to
some benefit, that there needs to be an assayData component, and that
it has reporterInfo and phenoData by virtue of its extension of
annotatedDataset.  Less than this and we are not solving the
high-throughput problem.  It has some other slots that I am not
so sure about that seem to be there for continuity with the
previous incarnation.

> Putting aside my perhaps ideological objections, maybe a compromise is
> to work on some of the concrete subclasses (two-channel data being one
> good example) and factor out the common elements as the evolve.

i agree.  i don't mean to be very pedantic about the representation/
method access concepts ... i have to run now.

More information about the Bioc-devel mailing list