[Bioc-devel] Biobase / eSet changes for this release

Martin Morgan mtmorgan at fhcrc.org
Thu Apr 13 23:37:37 CEST 2006

Hi Rafael,

An approach is to create a new class derived from eSet with
initialization and validation methods to specify the additional
elements to be stored in assayData. For the additions you suggest, the
following might be a useful template:

setClass("SnpSetPlus", contains="eSet")

setMethod("initialize", "SnpSetPlus",
                   phenoData = new("AnnotatedDataFrame"),
                   experimentData = new("MIAME"),
                   annotation = character(),
                   call = new("matrix"),
                   callProbability = new("matrix"),
                   copyNumber = new("matrix"),
                   copyNumberProbability = new("matrix"),
                   ... ) {
                           phenoData = phenoData,
                           experimentData = experimentData,
                           annotation = annotation,

setValidity("SnpSetPlus", function(object) {
                          "copyNumber", "copyNumberProbability"))

obj <- new("SnpSetPlus")


Rafael A Irizarry <ririzarr at jhsph.edu> writes:

> Hi!
> Looks good.
> One comment:
> Typically, SnpSet will have more than just the allele calls and p-values.
> It will aslo have a copy number estimate and it's measure of
> uncertainty. How hard is it to add these?
> You would need one more array  just like the one you used for the
> genotype calls.
> -r
> Martin Morgan wrote:
>>Biobase/eSet developers,
>>Here is a brief summary of the version of eSet to be included in the
>>this release; the code builds and checks without error, though missing
>>documentation (to be corrected within the week) mean that there are
>>still warning messages during check.  The most recent changes are in
>>There is one very recent change, to the overall class structure, that
>>we agonized over a great deal before making at the last moment.  We
>>recognize that this is very unfortunate timing, and that it will cause
>>needless work for bioconductors; we will help out as much as possible.
>>There are three major changes:
>>1. Change in class structure.
>>eSet -- VIRTUAL
>>  ExpressionSet
>>  SnpSet
>>  (TilingSet -- not implemented)
>>The main functionality of eSet is to coordinate assayData, phenoData,
>>experimentData, and the annoation.  eSet is also a generalized
>>container, with high-throughput data stored in the assayData
>>slot. eSet is a VIRTUAL class; if you want to store and manipulate a
>>consistent set of elements in the assay data slot you should create a
>>subclass of eSet. An example of how to do this is below.
>>ExpressionSet requires that the assayData slot contain matrix element
>>'exprs'; other elements (of dimension identical to exprs) are
>>permitted. as(exprSet, "ExpressionSet") coerces exprSet objects to
>>ExpressionSet, perhaps issuing warnings if ambiguities arise.
>>obj <- as(sample.exprSet, "ExpressionSet")
>>SnpSet is meant to contain SNP data in a manner analogous to
>>ExpressionSet; 'call' and 'callProbability' are required assayData
>>elements providing information on the call and a statement of
>>confidence in the call. The exact structure of these matricies is not
>>specified, but the idea is that 'call' encodes diploid genotypes.
>>2. Change in assayData storage
>>The assayData slot is an AssayData class union of 'list' and
>>'environment'; as a class union, there is no 'initialize'
>>method. Instead, the list or environment can be populated with
>>elements using a call to assayDataNew(...).
>>An innovation is the storageMode method, which can be used to change
>>how elements in assayData are stored. In particular the storageMode
>>can be 'lockedEnvironment', and indeed this is the default. An
>>environment is locked in the sense that new elements cannot be added
>>to the environment, and existing elements cannot be changed. This
>>means that the pass-by-reference semantics of environments will not
>>catch users off-guard:
>>obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>>storageMode(obj) <- "environment"
>>obj1 <- obj
>>exprs(obj1) <- exprs(obj1)[1:10,1:5]
>>dims(obj) # yikes! obj exprs dimensions changed!
>>obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>>storageMode(obj) <- "environment"
>>obj1 <- obj
>>exprs(obj1) <- log(exprs(obj))
>>identical(exprs(obj1),exprs(obj)) # TRUE: yikes again!
>>obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>>obj1 <- obj
>>exprs(obj1) <- log(exprs(obj1))
>>identical(exprs(obj1),exprs(obj)) # FALSE: good!
>>Note that attempts to directly change slots in locked environments
>>cause an error
>>>assayData(obj1)$exprs <- NULL
>> Error: cannot change value of a locked binding.
>>The setReplaceMethod for exprs (and assayData) succeeds by performing
>>a deep copy of the entire environment. Becaue this is very
>>inefficient, the recommended paradigm to update an element in a
>>lockedEnvironment is to extract it, make many changes, and then
>>reassign it, e.g.,
>>ex <- exprs(obj1)
>># many changes, ex <- log(ex), ...
>>exprs(obj1) <- ex
>>lockedEnvironment offers some efficiency in copying objects, because
>>the environment is not copied during function calls. This is not
>>completely satisfactory, though
>>func <- function(assayData) # good: contents of env will not be copied
>>  max(exprs(assayData)) # not so good: exprs copied from environment
>>3. Changes in other slots
>>Other slots have been changed to treat variable metadata more
>>efficiently (in the AnnotatedDataFrame class of slot phenoData) and to
>>simplify the type of data stored as experimentData. These changes are
>>mostly in line with the web discussions.
>>In making these changes, I have tried not to break the existing
>>interface beyond what is necessary for the new functionality (e.g.,
>>pData still returns the 'data' part of phenoData). One difference,
>>though, is that the methods dim, ncol, etc return a vector of
>>dimensions reflecting the shared dimensionality of the assayData
>>memebers; dims returns an array of dimensions of each element.
>>These changes affect eSets; any difficulties you might have with
>>exprSet probably reflect changes made several months ago to validity
>>Please let me know of any feedback,
>>The original 'sample.eSet' contains four elements in the assayData
>>slot: R, G, Rb, Gb. To derive a class from eSet for this data, create
>>a class, and provide initializaation and validation
>>methods. Optionally, update previous eSet data structures to your new
>>class. For instance,
>>setClass("SwirlSet", contains="eSet")
>>setMethod("initialize", "SwirlSet",
>>          function(.Object,
>>                   phenoData = new("AnnotatedDataFrame"),
>>                   experimentData = new("MIAME"),
>>                   annotation = character(),
>>                   R = new("matrix"),
>>                   G = new("matrix"),
>>                   Rb = new("matrix"),
>>                   Gb = new("matrix"),
>>                   ... ) {
>>            callNextMethod(.Object,
>>                           assayData = assayDataNew(
>>                             R=R, G=G, Rb=Rb, Gb=Gb,
>>                             ...),
>>                           phenoData = phenoData,
>>                           experimentData = experimentData,
>>                           annotation = annotation)
>>          })
>>setValidity("SwirlSet", function(object) {
>>  assayDataValidMembers(assayData(object), c("R", "G", "Rb", "Gb"))
>>obj <- updateOldESet(sample.eSet,"SwirlSet")
>>Bioc-devel at stat.math.ethz.ch mailing list

More information about the Bioc-devel mailing list