[Bioc-devel] eSet questions

Kevin R. Coombes krc at mdacc.tmc.edu
Wed Jan 10 21:15:29 CET 2007


Hi,

First, according to the manual pages for the "annotatedDataSet" class 
(in BioConductor 1.9) is a "virtual superset for 'exprSet' , 'eSet', 
etc".  While this seems to be the case for the soon-to-be-deprecated 
exprSet, it seems not to be the case for an eSet.  Is that 
interpretation correct?


Now to the real question.  While trying to think about how to handle a 
couple of data sets, I've started to become convinced that the current 
design of an eSet could be improved.  As it stands now, the design makes 
some assumptions about how the data should be stored and interpreted 
that I think are unnecessary and make it harder to generalize to other 
data types.

I have three use cases in mind:
[1] A vanilla two-color mRNA microarray expression data set, but one 
that is not quantified with a software package currently recognized by 
either limma or marray.
[2] A MINiML format file containing glass array data from GEO
[3] Reverse phase protein array (RPPA) data

In the first two cases, I'd like to be able to get all the raw data 
files into R as quickly as possible, and work there to figure out which 
columns represent red and green foreground and background, after which I 
can convert from the input format to something where I can use limma or 
marray.

In the RPPA case, the notions of "featureData" and "phenoData" are 
reversed.  Lysates of individual patient samples are spotted on the 
array, which is then probed with a mono-specific antibody targeting one 
protein. (See, for example, Tibes et al., Mol Cancer Ther 2006; 5:2512-21.)

One way to handle all three cases would be in something I'm tentatively 
calling an "ArrayCube", which should correspond fairly closely to a set 
of files on a hard drive.  Each file holds a two-dimensional table, 
where the rows correspond to spots on an array and the columns 
correspond to various things measured by a quantification software 
package.  An ArrayCube can be thought of conceptually as a list of these 
two-dimensional objects, where this third (list) dimension corresponds 
to whatever label-producing stuff was hybridized or incubated on the array.

Given this description, one might attempt a design something like

setClass("ArrayCube", representation=list(
	rawData = "AssayData",
	experimentData = "MIAME",
	featureData = "AnnotatedDataFrame",
	hybridizationData = "AnnotatedDataFrame",
	measurementData = "AnnotatedDataFrame"
))

This obviously looks a lot like an eSet.  The differences are
[1] I am thinking about the rawData entry as a list of data frames (or 
data matrices), with each one corresponding to a unique file on the hard 
disk.  These would be easy to read into R in the use cases above, but 
violate one of the validity constraints on the assayData object in the 
current eSet. (Specifically, the contraint that the columns in any 
matrix in the assayData object must correspond to rows of the phenoData 
object.)
[2] The featureData slot would describe the rows in each of those data 
matrices.  In order to accommodate the RPPA data, however, featureData 
might refer to patient samples instead of the genes that it would refer 
to in the eSet design.
[3] Similarly, hybridizationData would replace the phenoData slot, and 
it also could refer to samples or to genes/proteins depending on the 
data type.  Also, the phenoData object has to describe the "list" 
dimension of the rawData instead of the "column" dimension".
[4] The measurementData slot would describe the measurement columns from 
the software.  For already known packages, it would then be easy to 
convert an ArrayCube into, for example, an RGList by slicing along the 
desired measurement columns.  For novel quantification packages, one 
could make an interface that lets the user specify which measurements 
have which interpretation, and then make an RGList after they have had a 
chance to load the data easily and start exploring it.

In any event, before I head further down this road, I'd like to get some 
feedback on whether it would be [a] feasible or [b] desirable either to 
create such a thing or to change the design of an eSet into such a thing.

Best,
	Kevin



More information about the Bioc-devel mailing list