[Bioc-devel] Biobase / eSet changes for this release

Martin Morgan mtmorgan at fhcrc.org
Thu Apr 13 08:25:40 CEST 2006


Biobase/eSet developers,

Here is a brief summary of the version of eSet to be included in the
this release; the code builds and checks without error, though missing
documentation (to be corrected within the week) mean that there are
still warning messages during check.  The most recent changes are in
svn.

There is one very recent change, to the overall class structure, that
we agonized over a great deal before making at the last moment.  We
recognize that this is very unfortunate timing, and that it will cause
needless work for bioconductors; we will help out as much as possible.


There are three major changes:

1. Change in class structure.

eSet -- VIRTUAL
  ExpressionSet
  SnpSet
  (TilingSet -- not implemented)

The main functionality of eSet is to coordinate assayData, phenoData,
experimentData, and the annoation.  eSet is also a generalized
container, with high-throughput data stored in the assayData
slot. eSet is a VIRTUAL class; if you want to store and manipulate a
consistent set of elements in the assay data slot you should create a
subclass of eSet. An example of how to do this is below.

ExpressionSet requires that the assayData slot contain matrix element
'exprs'; other elements (of dimension identical to exprs) are
permitted. as(exprSet, "ExpressionSet") coerces exprSet objects to
ExpressionSet, perhaps issuing warnings if ambiguities arise.

library(Biobase)
data(sample.exprSet)
obj <- as(sample.exprSet, "ExpressionSet")
obj

SnpSet is meant to contain SNP data in a manner analogous to
ExpressionSet; 'call' and 'callProbability' are required assayData
elements providing information on the call and a statement of
confidence in the call. The exact structure of these matricies is not
specified, but the idea is that 'call' encodes diploid genotypes.

2. Change in assayData storage

The assayData slot is an AssayData class union of 'list' and
'environment'; as a class union, there is no 'initialize'
method. Instead, the list or environment can be populated with
elements using a call to assayDataNew(...).

An innovation is the storageMode method, which can be used to change
how elements in assayData are stored. In particular the storageMode
can be 'lockedEnvironment', and indeed this is the default. An
environment is locked in the sense that new elements cannot be added
to the environment, and existing elements cannot be changed. This
means that the pass-by-reference semantics of environments will not
catch users off-guard:

obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
storageMode(obj) <- "environment"
obj1 <- obj
exprs(obj1) <- exprs(obj1)[1:10,1:5]
dims(obj) # yikes! obj exprs dimensions changed!

obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
storageMode(obj) <- "environment"
obj1 <- obj
exprs(obj1) <- log(exprs(obj))
identical(exprs(obj1),exprs(obj)) # TRUE: yikes again!

obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
obj1 <- obj
exprs(obj1) <- log(exprs(obj1))
identical(exprs(obj1),exprs(obj)) # FALSE: good!

Note that attempts to directly change slots in locked environments
cause an error

> assayData(obj1)$exprs <- NULL
Error: cannot change value of a locked binding. 

The setReplaceMethod for exprs (and assayData) succeeds by performing
a deep copy of the entire environment. Becaue this is very
inefficient, the recommended paradigm to update an element in a
lockedEnvironment is to extract it, make many changes, and then
reassign it, e.g.,

ex <- exprs(obj1)
# many changes, ex <- log(ex), ...
exprs(obj1) <- ex

lockedEnvironment offers some efficiency in copying objects, because
the environment is not copied during function calls. This is not
completely satisfactory, though

func <- function(assayData) # good: contents of env will not be copied
  max(exprs(assayData)) # not so good: exprs copied from environment


3. Changes in other slots

Other slots have been changed to treat variable metadata more
efficiently (in the AnnotatedDataFrame class of slot phenoData) and to
simplify the type of data stored as experimentData. These changes are
mostly in line with the web discussions.



In making these changes, I have tried not to break the existing
interface beyond what is necessary for the new functionality (e.g.,
pData still returns the 'data' part of phenoData). One difference,
though, is that the methods dim, ncol, etc return a vector of
dimensions reflecting the shared dimensionality of the assayData
memebers; dims returns an array of dimensions of each element.

These changes affect eSets; any difficulties you might have with
exprSet probably reflect changes made several months ago to validity
checking.

Please let me know of any feedback,

Martin
--

The original 'sample.eSet' contains four elements in the assayData
slot: R, G, Rb, Gb. To derive a class from eSet for this data, create
a class, and provide initializaation and validation
methods. Optionally, update previous eSet data structures to your new
class. For instance,

setClass("SwirlSet", contains="eSet")

setMethod("initialize", "SwirlSet",
          function(.Object,
                   phenoData = new("AnnotatedDataFrame"),
                   experimentData = new("MIAME"),
                   annotation = character(),
                   R = new("matrix"),
                   G = new("matrix"),
                   Rb = new("matrix"),
                   Gb = new("matrix"),
                   ... ) {
            callNextMethod(.Object,
                           assayData = assayDataNew(
                             R=R, G=G, Rb=Rb, Gb=Gb,
                             ...),
                           phenoData = phenoData,
                           experimentData = experimentData,
                           annotation = annotation)
          })

setValidity("SwirlSet", function(object) {
  assayDataValidMembers(assayData(object), c("R", "G", "Rb", "Gb"))
})

data(sample.eSet)
obj <- updateOldESet(sample.eSet,"SwirlSet")



More information about the Bioc-devel mailing list