[Bioc-devel] RFC: eSet with two color data

Wolfgang Huber huber at ebi.ac.uk
Sun Mar 25 15:56:17 CEST 2007


Dear Seth,

> What are typical actions with such an object?  I'm particularly
> interested in access patterns for subsetting.  Is getting a matrix for
> each color a common thing to do?
> 
> I think the data organization of the expression values in option B
> (congruent matrices in assayData, one for each color) has some
> advantages in terms of accessing a given color in an efficient manner.
> Ratios of colors is vectorized easily and fast.  With option A neigher
> operation is quite as straight forward I think.
> 
> It is true that option B would require some amount of coding.  Martin
> Morgan and I discussed this a bit we realized that one could have
> phenoData exactly the same as in option A.  The phenoData table would
> have a special column (label/dye/color/colour) and values would
> correspond to named matrices in assayData.  The eSet extension would
> then handle subsetting (this is the infrastructure that would need
> coding).
> 
> I suspect that the efficiency difference in obtaining an expression
> matrix for a particular dye will make option B worth the effort.

Thanks, these are good points. Both options are equivalent, it seems 
that would work and if there is a volunteer to implement B that would be 
great.

Just to note - computational efficiency is very important, but I don't 
think that this current question is one of the bottlenecks in the 
overall workflows, so an investment here may not bring many returns:

Computing the log-Ratios is an important operation, but typically this 
is done once in the lifetime of a dataset, and perhaps the best way to 
think of this is to have a function logRatio() that takes a two-colour 
ExpressionSet and returns an M-value ExpressionSet (similar for 
log-Product). The computational overhead of doing something like

	x[,idxGreen] - x[,idxRed]

versus

        x[[1]] -  x[[2]]

once or a few times is not large, compared to many other things we do 
with ExpressionSets and I don't think would be itself justify a lot of 
new infrastructure.


Best wishes
   Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber



More information about the Bioc-devel mailing list