[Bioc-devel] RFC: eSet with two color data
Wolfgang Huber
huber at ebi.ac.uk
Sun Mar 25 15:56:17 CEST 2007
Dear Seth,
> What are typical actions with such an object? I'm particularly
> interested in access patterns for subsetting. Is getting a matrix for
> each color a common thing to do?
>
> I think the data organization of the expression values in option B
> (congruent matrices in assayData, one for each color) has some
> advantages in terms of accessing a given color in an efficient manner.
> Ratios of colors is vectorized easily and fast. With option A neigher
> operation is quite as straight forward I think.
>
> It is true that option B would require some amount of coding. Martin
> Morgan and I discussed this a bit we realized that one could have
> phenoData exactly the same as in option A. The phenoData table would
> have a special column (label/dye/color/colour) and values would
> correspond to named matrices in assayData. The eSet extension would
> then handle subsetting (this is the infrastructure that would need
> coding).
>
> I suspect that the efficiency difference in obtaining an expression
> matrix for a particular dye will make option B worth the effort.
Thanks, these are good points. Both options are equivalent, it seems
that would work and if there is a volunteer to implement B that would be
great.
Just to note - computational efficiency is very important, but I don't
think that this current question is one of the bottlenecks in the
overall workflows, so an investment here may not bring many returns:
Computing the log-Ratios is an important operation, but typically this
is done once in the lifetime of a dataset, and perhaps the best way to
think of this is to have a function logRatio() that takes a two-colour
ExpressionSet and returns an M-value ExpressionSet (similar for
log-Product). The computational overhead of doing something like
x[,idxGreen] - x[,idxRed]
versus
x[[1]] - x[[2]]
once or a few times is not large, compared to many other things we do
with ExpressionSets and I don't think would be itself justify a lot of
new infrastructure.
Best wishes
Wolfgang
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
More information about the Bioc-devel
mailing list