[Bioc-devel] prada package and eSet in Biobase

Florian Hahne fhahne at gmx.de
Thu Apr 13 09:49:21 CEST 2006


Hi Martin,
thanks, I've already seen it on the biocDevel mailinglist. The bad thing
is that I changed my code two days ago to make it work with your
MultiExpressionSet subclass but now this has been kicked out again. So I
guess I will build my own subclass instead, bad luck...
Although I think that this would be a useful thing to be provided by
Biobase, there seem to be many use cases for this and I don't see the
benefit in implementing it over and over again. I do see that having
such a general subclass makes validation of the instantiated objects
extremely hard. So what were the considerations for dropping
MultiExpressionSet from Biobase?
Regards,
Florian


Martin Morgan schrieb:
> Hi Florian -- Seth had forwarded an email of his to you about eSets. I
> wanted to make sure you were aware of some last minute changes that
> had to be made, as well as the overall revisions to the
> class. Attached is a summary of changes made. You might find the
> example at the end (extending eSet) to be useful for your
> situation. Let me know if I can be of any assistance.
>
> Martin
> --------
> Biobase/eSet developers,
>
> Here is a brief summary of the version of eSet to be included in the
> this release; the code builds and checks without error, though missing
> documentation (to be corrected within the week) mean that there are
> still warning messages during check.  The most recent changes are in
> svn.
>
> There is one very recent change, to the overall class structure, that
> we agonized over a great deal before making at the last moment.  We
> recognize that this is very unfortunate timing, and that it will cause
> needless work for bioconductors; we will help out as much as possible.
>
>
> There are three major changes:
>
> 1. Change in class structure.
>
> eSet -- VIRTUAL
>   ExpressionSet
>   SnpSet
>   (TilingSet -- not implemented)
>
> The main functionality of eSet is to coordinate assayData, phenoData,
> experimentData, and the annoation.  eSet is also a generalized
> container, with high-throughput data stored in the assayData
> slot. eSet is a VIRTUAL class; if you want to store and manipulate a
> consistent set of elements in the assay data slot you should create a
> subclass of eSet. An example of how to do this is below.
>
> ExpressionSet requires that the assayData slot contain matrix element
> 'exprs'; other elements (of dimension identical to exprs) are
> permitted. as(exprSet, "ExpressionSet") coerces exprSet objects to
> ExpressionSet, perhaps issuing warnings if ambiguities arise.
>
> library(Biobase)
> data(sample.exprSet)
> obj <- as(sample.exprSet, "ExpressionSet")
> obj
>
> SnpSet is meant to contain SNP data in a manner analogous to
> ExpressionSet; 'call' and 'callProbability' are required assayData
> elements providing information on the call and a statement of
> confidence in the call. The exact structure of these matricies is not
> specified, but the idea is that 'call' encodes diploid genotypes.
>
> 2. Change in assayData storage
>
> The assayData slot is an AssayData class union of 'list' and
> 'environment'; as a class union, there is no 'initialize'
> method. Instead, the list or environment can be populated with
> elements using a call to assayDataNew(...).
>
> An innovation is the storageMode method, which can be used to change
> how elements in assayData are stored. In particular the storageMode
> can be 'lockedEnvironment', and indeed this is the default. An
> environment is locked in the sense that new elements cannot be added
> to the environment, and existing elements cannot be changed. This
> means that the pass-by-reference semantics of environments will not
> catch users off-guard:
>
> obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
> storageMode(obj) <- "environment"
> obj1 <- obj
> exprs(obj1) <- exprs(obj1)[1:10,1:5]
> dims(obj) # yikes! obj exprs dimensions changed!
>
> obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
> storageMode(obj) <- "environment"
> obj1 <- obj
> exprs(obj1) <- log(exprs(obj))
> identical(exprs(obj1),exprs(obj)) # TRUE: yikes again!
>
> obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
> obj1 <- obj
> exprs(obj1) <- log(exprs(obj1))
> identical(exprs(obj1),exprs(obj)) # FALSE: good!
>
> Note that attempts to directly change slots in locked environments
> cause an error
>
>   
>> assayData(obj1)$exprs <- NULL
>>     
> Error: cannot change value of a locked binding. 
>
> The setReplaceMethod for exprs (and assayData) succeeds by performing
> a deep copy of the entire environment. Becaue this is very
> inefficient, the recommended paradigm to update an element in a
> lockedEnvironment is to extract it, make many changes, and then
> reassign it, e.g.,
>
> ex <- exprs(obj1)
> # many changes, ex <- log(ex), ...
> exprs(obj1) <- ex
>
> lockedEnvironment offers some efficiency in copying objects, because
> the environment is not copied during function calls. This is not
> completely satisfactory, though
>
> func <- function(assayData) # good: contents of env will not be copied
>   max(exprs(assayData)) # not so good: exprs copied from environment
>
>
> 3. Changes in other slots
>
> Other slots have been changed to treat variable metadata more
> efficiently (in the AnnotatedDataFrame class of slot phenoData) and to
> simplify the type of data stored as experimentData. These changes are
> mostly in line with the web discussions.
>
>
>
> In making these changes, I have tried not to break the existing
> interface beyond what is necessary for the new functionality (e.g.,
> pData still returns the 'data' part of phenoData). One difference,
> though, is that the methods dim, ncol, etc return a vector of
> dimensions reflecting the shared dimensionality of the assayData
> memebers; dims returns an array of dimensions of each element.
>
> These changes affect eSets; any difficulties you might have with
> exprSet probably reflect changes made several months ago to validity
> checking.
>
> Please let me know of any feedback,
>
> Martin
> --
>
> The original 'sample.eSet' contains four elements in the assayData
> slot: R, G, Rb, Gb. To derive a class from eSet for this data, create
> a class, and provide initializaation and validation
> methods. Optionally, update previous eSet data structures to your new
> class. For instance,
>
> setClass("SwirlSet", contains="eSet")
>
> setMethod("initialize", "SwirlSet",
>           function(.Object,
>                    phenoData = new("AnnotatedDataFrame"),
>                    experimentData = new("MIAME"),
>                    annotation = character(),
>                    R = new("matrix"),
>                    G = new("matrix"),
>                    Rb = new("matrix"),
>                    Gb = new("matrix"),
>                    ... ) {
>             callNextMethod(.Object,
>                            assayData = assayDataNew(
>                              R=R, G=G, Rb=Rb, Gb=Gb,
>                              ...),
>                            phenoData = phenoData,
>                            experimentData = experimentData,
>                            annotation = annotation)
>           })
>
> setValidity("SwirlSet", function(object) {
>   assayDataValidMembers(assayData(object), c("R", "G", "Rb", "Gb"))
> })
>
> data(sample.eSet)
> obj <- updateOldESet(sample.eSet,"SwirlSet")
>
> Seth Falcon <sfalcon at fhcrc.org> writes:
>
>   
>> Hi Florian,
>>
>> As you may have noticed, the prada package is not building against the
>> new Biobase code.
>>
>> My apologies for not keeping you more in the loop regarding the
>> refactoring of Biobase and eSet in particular.
>>
>> Here's the story:
>>
>> There has been a consensus for awhile that exprSet is not general
>> enough to handle the new chip technologies that are emerging.  eSet
>> was proposed (awhile ago) as a replacement, but its design has been
>> provisional.  I know you have been using it.  
>>
>> We've recently had time to revisit the design.  For the gory details,
>> see the discussion here:
>> http://wiki.fhcrc.org/bioc/Core_Bioconductor_Classes_Discussion
>>
>> Martin Morgan (cc'd) has implemented a refactored eSet along with some
>> subclasses.  See the latest Biobase svn for details.
>>
>> Briefly, the idea is that eSet is now an abstract superclass and that
>> for each technology we will have concrete subclasses.
>>
>> So to make prada work, I suspect you will need to create a subclass of
>> eSet of your own, unless one of Martin's subclasses will work for you.
>>
>> I realize this may not have been the news your were hoping for.
>> Please have a look at the changes and feel free to ask me or Martin
>> any questions (might be good to send the questions to bioc-devel,
>> however).
>>
>> Thanks,
>>
>> -- 
>> + seth
>>     


-- 
Florian Hahne
Abt. Molekulare Genomanalyse (B050)
Deutsches Krebsforschungszentrum (DKFZ)
Im Neuenheimer Feld 580
D-69120 Heidelberg
phone: 0049 6221 424764
fax: 0049 6221 422399
web: www.dkfz.de/mga



More information about the Bioc-devel mailing list