[Bioc-devel] Modified eSet class definition, updated serialized eSet objects, and broken BioC 2.4 builds
Patrick Aboyoun
paboyoun at fhcrc.org
Tue Aug 4 20:11:29 CEST 2009
After receiving your collective feedback on a recent eSet class
definition modifications I made, we changed our design to better reflect
the reality microarray data analysts face.
In the newly updated Biobase package we take the view of there being two
main curators of covariates (be they of type phenotype, genotype,
experimental, etc.): those covariates recorded by the experimenter and
those covariates recorded by the measuring device (typically stored as
header information in a data file). In terms of the eSet class and its
derivatives, experimenter curated covariates should be housed in the
phenoData (AnnotatedDataFrame) slot and manufacturer curated covariates
are to be stored in a new protocolData (AnnotatedDataFrame) slot. The
intent is that the new protocolData slot is only modified by a data file
read operation, like affy's read.affybatch function. Read function
ownership of the protocolData slot will provide the developer with the
assurance that they are not stomping on the user's data as well as
provide the end user with a clean representation of what metadata was
contained within the original data files. The end user can always copy
the protocolData information into the phenoData slot to make analysis
easier. The power of this new protocolData slot will be dependent on the
maintainers of packages that read in microarray data since there is a
fair amount of metadata that can be harvested from data file headers
that, through standard conventions, is currently being ignored. As part
of this change, the scanDate slot has been removed from the eSet class.
To make this transition smoother, I spent a day or so updating all the
serialized eSet objects I could find so they will have the protocolData
slot and pass a validObject() check. From my examination, all is well in
the BioC 2.5 branch, but now the BioC 2.4 branch has a number of build
failures because we don't fork data experiment packages and data
experiment packages that have newly serialized eSet objects will not
work with the release branch. These build package failures do not affect
end users of BioC 2.4 since bioconductor.org has versions of the data
packages that contain the old serialized eSet objects. The main problem
we have now is that if you wish to patch a BioC 2.4 package and your
package does not build due to a dependency on a data experiment package,
we will need to hand build and push your package. This issue may get us
to rethink our policy of not forking data experiment packages in svn.
There is a cost with forking the data experiment packages and so far it
has outweighed the benefits we would receive. If we find the benefits
rising, we will adopt the same svn forking approach we have with the
software packages.
Thanks again for your feedback the first go around.
- The Biocore Team
More information about the Bioc-devel
mailing list