[Bioc-devel] RFC: eSet with two color data

Francois Pepin fpepin at cs.mcgill.ca
Wed Mar 21 18:19:00 CET 2007


Hi,

Many people (and packages) do count on the ratios of a 2-color arrays
(with a reference design, for example). In those cases, using the M and
A values would be reasonable. Adding one or two column in PhenoData
about the samples on each color would probably be enough. This is what
we generally deal with and the extension would be minor. I don't know if
it would be worth to consider an extension for this specific use-case.

For the general case, each color would be dealt with individually using
single-channel analyses and I agree that the first option would be
better.

Francois

On Wed, 2007-03-21 at 10:13 -0500, Kevin R. Coombes wrote: 
> Hi,
> 
> Having thought about this several times, I keep coming back to [A], for 
> exactly the reasons you point out. It has the decided advantage of 
> generalizing to any number of colors (including 1!) -- which actually 
> suggest that ExpressionSet might be modified to include your required 
> columns.  One might, however, prefer "Label" instead of "dye" to allow 
> for somewhat more generality.
> 
> Best,
> 	Kevin
> 
> Wolfgang Huber wrote:
> > Dear all,
> > 
> > I hope that this question is not too tedious for those who have already
> > thought hard about it, but I am not aware of consensus and good
> > documentation in Biobase on this topic:
> > 
> > How can we best represent preprocessed, normalised data from a set of
> > two- (or n-) colour arrays in an eSet like structure? I would like to
> > keep the intensity information of each channel, and not reduce to
> > M-values since that looses information.
> > 
> > I see two options:
> > 
> > A) in an ExpressionSet-derivative called e.g. "ExpressionSetWithColors"
> > with ncol = n times the number of arrays, and with mandatory phenoData
> > columns named e.g. "arrayID" and "dye" .
> > 
> > B) in an eSet-derivative with ncol = the number of arrays, and n
> > congruent matrices in the assayData slot.
> > 
> > Currently I prefer A, because
> > - most of the infrastructure is already there and the additional work is
> > little
> > - in B, the interpretation of the phenoData columns gets mushy because
> > some columns will refer to the arrays, others to one particular sample
> > of the n hybrised to each array, and we need additional infrastructure
> > to resolve that.
> > 
> > Is there anything that someone can point out that I am not aware of?
> > 
> > Also (different topic:) do we already have an ontology in place
> > somewhere for control features (e.g. empty features, features measuring
> > a known spike-in ratio)?
> > 
> > Best wishes
> >  Wolfgang
> > 
> > ------------------------------------------------------------------
> > Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
> > 
> > _______________________________________________
> > Bioc-devel at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list