[Bioc-devel] RFC: eSet with two color data

Mon Mar 26 17:19:54 CEST 2007

On Monday 26 March 2007 11:02, Seth Falcon wrote:
> Wolfgang Huber <huber at ebi.ac.uk> writes:
> > Hi Seth,
> >
> >> Thinking through option A, it also requires infrastructure because it
> >> breaks the subsetting model in the other direction.  If a user does:
> >>
> >>     x[1:3, ]
> >>
> >> what should happen? With the current code, they would get something
> >> that would not be a valid n-colour set and probably would not be
> >> desired in this context.  Since both options require some subset
> >> coding, I think going for the option that suggests more efficiency
> >> is best.
> >
> > What should happen is that they get an ExpressionSet with only three
> > features, but the same samples (arrays x colours = columns of exprs and
> > rows of phenoData). The different colours are stored sideways, so unless
> > I really need to catch up with recent changes in Expression specs, I
> > think the row-subsetting does what we want.
>
> Sorry, not enough coffee for me yet.  Yes, row-subsetting does what we
> want I think.  What I'm concerned about is column-subsetting:
>
>     x[, 1:3]
>
> The result is no longer an n-colour set and this breaks the "["
> concept.  With option A you want certain code to be able to assume
> that the structure of the exprs matrix is ncol = arrays * colours.
> Without some coding effort, you cannot have this work in a general
> way.

Seth,

I see your point.  However, since both options A and B encode the same 
information (don't they?), it should be possible to do column subsetting with 
either.  What piece of information is missing that keeps column subsetting 
from working in both cases?  If that piece of information is truly missing or 
not implied, then, in my mind, the model needs to be enriched; options A and 
B should really be equivalent from an information content point-of-view, 
shouldn't they?

Sean