[Bioc-devel] Modified eSet class definition, updated serialized eSet objects, and broken BioC 2.4 builds

Patrick Aboyoun paboyoun at fhcrc.org
Tue Aug 4 20:11:29 CEST 2009

After receiving your collective feedback on a recent eSet class 
definition modifications I made, we changed our design to better reflect 
the reality microarray data analysts face.

In the newly updated Biobase package we take the view of there being two 
main curators of covariates (be they of type phenotype, genotype, 
experimental, etc.): those covariates recorded by the experimenter and 
those covariates recorded by the measuring device (typically stored as 
header information in a data file). In terms of the eSet class and its 
derivatives, experimenter curated covariates should be housed in the 
phenoData (AnnotatedDataFrame) slot and manufacturer curated covariates 
are to be stored in a new protocolData (AnnotatedDataFrame) slot. The 
intent is that the new protocolData slot is only modified by a data file 
read operation, like affy's read.affybatch function. Read function 
ownership of the protocolData slot will provide the developer with the 
assurance that they are not stomping on the user's data as well as 
provide the end user with a clean representation of what metadata was 
contained within the original data files. The end user can always copy 
the protocolData information into the phenoData slot to make analysis 
easier. The power of this new protocolData slot will be dependent on the 
maintainers of packages that read in microarray data since there is a 
fair amount of metadata that can be harvested from data file headers 
that, through standard conventions, is currently being ignored. As part 
of this change, the scanDate slot has been removed from the eSet class.

To make this transition smoother, I spent a day or so updating all the 
serialized eSet objects I could find so they will have the protocolData 
slot and pass a validObject() check. From my examination, all is well in 
the BioC 2.5 branch, but now the BioC 2.4 branch has a number of build 
failures because we don't fork data experiment packages and data 
experiment packages that have newly serialized eSet objects will not 
work with the release branch. These build package failures do not affect 
end users of BioC 2.4 since bioconductor.org has versions of the data 
packages that contain the old serialized eSet objects. The main problem 
we have now is that if you wish to patch a BioC 2.4 package and your 
package does not build due to a dependency on a data experiment package, 
we will need to hand build and push your package. This issue may get us 
to rethink our policy of not forking data experiment packages in svn. 
There is a cost with forking the data experiment packages and so far it 
has outweighed the benefits we would receive. If we find the benefits 
rising, we will adopt the same svn forking approach we have with the 
software packages.

Thanks again for your feedback the first go around.

- The Biocore Team

More information about the Bioc-devel mailing list