[Bioc-devel] RFC: eSet with two color data

Seth Falcon sfalcon at fhcrc.org
Thu Mar 22 01:50:05 CET 2007


Wolfgang Huber <huber at ebi.ac.uk> writes:
> How can we best represent preprocessed, normalised data from a set of
> two- (or n-) colour arrays in an eSet like structure? I would like to
> keep the intensity information of each channel, and not reduce to
> M-values since that looses information.
>
> I see two options:
>
> A) in an ExpressionSet-derivative called e.g. "ExpressionSetWithColors"
> with ncol = n times the number of arrays, and with mandatory phenoData
> columns named e.g. "arrayID" and "dye" .
>
> B) in an eSet-derivative with ncol = the number of arrays, and n
> congruent matrices in the assayData slot.
>
> Currently I prefer A, because
> - most of the infrastructure is already there and the additional work is
> little
> - in B, the interpretation of the phenoData columns gets mushy because
> some columns will refer to the arrays, others to one particular sample
> of the n hybrised to each array, and we need additional infrastructure
> to resolve that.

What are typical actions with such an object?  I'm particularly
interested in access patterns for subsetting.  Is getting a matrix for
each color a common thing to do?

I think the data organization of the expression values in option B
(congruent matrices in assayData, one for each color) has some
advantages in terms of accessing a given color in an efficient manner.
Ratios of colors is vectorized easily and fast.  With option A neigher
operation is quite as straight forward I think.

It is true that option B would require some amount of coding.  Martin
Morgan and I discussed this a bit we realized that one could have
phenoData exactly the same as in option A.  The phenoData table would
have a special column (label/dye/color/colour) and values would
correspond to named matrices in assayData.  The eSet extension would
then handle subsetting (this is the infrastructure that would need
coding).

I suspect that the efficiency difference in obtaining an expression
matrix for a particular dye will make option B worth the effort.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org



More information about the Bioc-devel mailing list