[Bioc-devel] RFC: eSet with two color data
sfalcon at fhcrc.org
Thu Mar 22 01:50:05 CET 2007
Wolfgang Huber <huber at ebi.ac.uk> writes:
> How can we best represent preprocessed, normalised data from a set of
> two- (or n-) colour arrays in an eSet like structure? I would like to
> keep the intensity information of each channel, and not reduce to
> M-values since that looses information.
> I see two options:
> A) in an ExpressionSet-derivative called e.g. "ExpressionSetWithColors"
> with ncol = n times the number of arrays, and with mandatory phenoData
> columns named e.g. "arrayID" and "dye" .
> B) in an eSet-derivative with ncol = the number of arrays, and n
> congruent matrices in the assayData slot.
> Currently I prefer A, because
> - most of the infrastructure is already there and the additional work is
> - in B, the interpretation of the phenoData columns gets mushy because
> some columns will refer to the arrays, others to one particular sample
> of the n hybrised to each array, and we need additional infrastructure
> to resolve that.
What are typical actions with such an object? I'm particularly
interested in access patterns for subsetting. Is getting a matrix for
each color a common thing to do?
I think the data organization of the expression values in option B
(congruent matrices in assayData, one for each color) has some
advantages in terms of accessing a given color in an efficient manner.
Ratios of colors is vectorized easily and fast. With option A neigher
operation is quite as straight forward I think.
It is true that option B would require some amount of coding. Martin
Morgan and I discussed this a bit we realized that one could have
phenoData exactly the same as in option A. The phenoData table would
have a special column (label/dye/color/colour) and values would
correspond to named matrices in assayData. The eSet extension would
then handle subsetting (this is the infrastructure that would need
I suspect that the efficiency difference in obtaining an expression
matrix for a particular dye will make option B worth the effort.
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
More information about the Bioc-devel