[Bioc-devel] Biobase / eSet changes for this release

Rafael A Irizarry ririzarr at jhsph.edu
Tue Apr 18 16:19:18 CEST 2006


Martin,

Sorry for taking so long in responding.

We seem to be almost there but I have two concerns:

1) SNPset as it is will not be useful. As far as I know only Rob and me 
are  developing software that will use snpSet. Both of us need a slot 
for copynumber. Otherwise we will need to create a new class, which will 
be the only one used, and we wont get to use the name SnpSet.

2) It seems that in general we will be storing  an esimate (expression, 
calls, copynumber) and some kind of measure of uncertainty (SE for 
expression, p-value fo calls, etc..). However, a big chunk fo the apps 
will not use uncertainty. It would be a shame to have to store a matrix 
of NAs every single time. How hard would be to have eSet take NULL for 
some matrices? The validity check can look at everything that is not NULL.
Notice that the alternative is to define  a new class which means, in my 
case, Ill need two classes for every class Im defining or having a 
matrix of NAs, which, given the size of data these days, will be very 
inneficient.

im cc-ing ingo and rob who maintain SNPscan.

-r

Martin Morgan wrote:

>Biobase/eSet developers,
>
>Here is a brief summary of the version of eSet to be included in the
>this release; the code builds and checks without error, though missing
>documentation (to be corrected within the week) mean that there are
>still warning messages during check.  The most recent changes are in
>svn.
>
>There is one very recent change, to the overall class structure, that
>we agonized over a great deal before making at the last moment.  We
>recognize that this is very unfortunate timing, and that it will cause
>needless work for bioconductors; we will help out as much as possible.
>
>
>There are three major changes:
>
>1. Change in class structure.
>
>eSet -- VIRTUAL
>  ExpressionSet
>  SnpSet
>  (TilingSet -- not implemented)
>
>The main functionality of eSet is to coordinate assayData, phenoData,
>experimentData, and the annoation.  eSet is also a generalized
>container, with high-throughput data stored in the assayData
>slot. eSet is a VIRTUAL class; if you want to store and manipulate a
>consistent set of elements in the assay data slot you should create a
>subclass of eSet. An example of how to do this is below.
>
>ExpressionSet requires that the assayData slot contain matrix element
>'exprs'; other elements (of dimension identical to exprs) are
>permitted. as(exprSet, "ExpressionSet") coerces exprSet objects to
>ExpressionSet, perhaps issuing warnings if ambiguities arise.
>
>library(Biobase)
>data(sample.exprSet)
>obj <- as(sample.exprSet, "ExpressionSet")
>obj
>
>SnpSet is meant to contain SNP data in a manner analogous to
>ExpressionSet; 'call' and 'callProbability' are required assayData
>elements providing information on the call and a statement of
>confidence in the call. The exact structure of these matricies is not
>specified, but the idea is that 'call' encodes diploid genotypes.
>
>2. Change in assayData storage
>
>The assayData slot is an AssayData class union of 'list' and
>'environment'; as a class union, there is no 'initialize'
>method. Instead, the list or environment can be populated with
>elements using a call to assayDataNew(...).
>
>An innovation is the storageMode method, which can be used to change
>how elements in assayData are stored. In particular the storageMode
>can be 'lockedEnvironment', and indeed this is the default. An
>environment is locked in the sense that new elements cannot be added
>to the environment, and existing elements cannot be changed. This
>means that the pass-by-reference semantics of environments will not
>catch users off-guard:
>
>obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>storageMode(obj) <- "environment"
>obj1 <- obj
>exprs(obj1) <- exprs(obj1)[1:10,1:5]
>dims(obj) # yikes! obj exprs dimensions changed!
>
>obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>storageMode(obj) <- "environment"
>obj1 <- obj
>exprs(obj1) <- log(exprs(obj))
>identical(exprs(obj1),exprs(obj)) # TRUE: yikes again!
>
>obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>obj1 <- obj
>exprs(obj1) <- log(exprs(obj1))
>identical(exprs(obj1),exprs(obj)) # FALSE: good!
>
>Note that attempts to directly change slots in locked environments
>cause an error
>
>  
>
>>assayData(obj1)$exprs <- NULL
>>    
>>
>Error: cannot change value of a locked binding. 
>
>The setReplaceMethod for exprs (and assayData) succeeds by performing
>a deep copy of the entire environment. Becaue this is very
>inefficient, the recommended paradigm to update an element in a
>lockedEnvironment is to extract it, make many changes, and then
>reassign it, e.g.,
>
>ex <- exprs(obj1)
># many changes, ex <- log(ex), ...
>exprs(obj1) <- ex
>
>lockedEnvironment offers some efficiency in copying objects, because
>the environment is not copied during function calls. This is not
>completely satisfactory, though
>
>func <- function(assayData) # good: contents of env will not be copied
>  max(exprs(assayData)) # not so good: exprs copied from environment
>
>
>3. Changes in other slots
>
>Other slots have been changed to treat variable metadata more
>efficiently (in the AnnotatedDataFrame class of slot phenoData) and to
>simplify the type of data stored as experimentData. These changes are
>mostly in line with the web discussions.
>
>
>
>In making these changes, I have tried not to break the existing
>interface beyond what is necessary for the new functionality (e.g.,
>pData still returns the 'data' part of phenoData). One difference,
>though, is that the methods dim, ncol, etc return a vector of
>dimensions reflecting the shared dimensionality of the assayData
>memebers; dims returns an array of dimensions of each element.
>
>These changes affect eSets; any difficulties you might have with
>exprSet probably reflect changes made several months ago to validity
>checking.
>
>Please let me know of any feedback,
>
>Martin
>--
>
>The original 'sample.eSet' contains four elements in the assayData
>slot: R, G, Rb, Gb. To derive a class from eSet for this data, create
>a class, and provide initializaation and validation
>methods. Optionally, update previous eSet data structures to your new
>class. For instance,
>
>setClass("SwirlSet", contains="eSet")
>
>setMethod("initialize", "SwirlSet",
>          function(.Object,
>                   phenoData = new("AnnotatedDataFrame"),
>                   experimentData = new("MIAME"),
>                   annotation = character(),
>                   R = new("matrix"),
>                   G = new("matrix"),
>                   Rb = new("matrix"),
>                   Gb = new("matrix"),
>                   ... ) {
>            callNextMethod(.Object,
>                           assayData = assayDataNew(
>                             R=R, G=G, Rb=Rb, Gb=Gb,
>                             ...),
>                           phenoData = phenoData,
>                           experimentData = experimentData,
>                           annotation = annotation)
>          })
>
>setValidity("SwirlSet", function(object) {
>  assayDataValidMembers(assayData(object), c("R", "G", "Rb", "Gb"))
>})
>
>data(sample.eSet)
>obj <- updateOldESet(sample.eSet,"SwirlSet")
>
>_______________________________________________
>Bioc-devel at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>  
>



More information about the Bioc-devel mailing list