[Bioc-devel] Biobase / eSet changes for this release

Rafael A. Irizarry ririzarr at jhsph.edu
Fri Apr 14 03:36:55 CEST 2006


but most applications will have both genotype call and
copy number estimates. so why not just make SnpSet like SnpSetPlus below?

Notice that in our software i will only use SpnSetPlus and so will Rob.

-r


On Thu, 13 Apr 2006, Martin Morgan wrote:

> Hi Rafael,
>
> An approach is to create a new class derived from eSet with
> initialization and validation methods to specify the additional
> elements to be stored in assayData. For the additions you suggest, the
> following might be a useful template:
>
> setClass("SnpSetPlus", contains="eSet")
>
> setMethod("initialize", "SnpSetPlus",
>          function(.Object,
>                   phenoData = new("AnnotatedDataFrame"),
>                   experimentData = new("MIAME"),
>                   annotation = character(),
>                   call = new("matrix"),
>                   callProbability = new("matrix"),
>                   copyNumber = new("matrix"),
>                   copyNumberProbability = new("matrix"),
>                   ... ) {
>            callNextMethod(.Object,
>                           phenoData = phenoData,
>                           experimentData = experimentData,
>                           annotation = annotation,
>                           call=call,
>                           callProbability=callProbability,
>                           copyNumber=copyNumber,
>                           copyNumberProbability=copyNumberProbability,
>                           ...)
>          })
>
> setValidity("SnpSetPlus", function(object) {
>  assayDataValidMembers(assayData(object),
>                        c("call","callProbability",
>                          "copyNumber", "copyNumberProbability"))
> })
>
> obj <- new("SnpSetPlus")
>
> Martin
>
>
> Rafael A Irizarry <ririzarr at jhsph.edu> writes:
>
>> Hi!
>>
>> Looks good.
>>
>> One comment:
>> Typically, SnpSet will have more than just the allele calls and p-values.
>> It will aslo have a copy number estimate and it's measure of
>> uncertainty. How hard is it to add these?
>> You would need one more array  just like the one you used for the
>> genotype calls.
>>
>> -r
>>
>> Martin Morgan wrote:
>>
>>> Biobase/eSet developers,
>>>
>>> Here is a brief summary of the version of eSet to be included in the
>>> this release; the code builds and checks without error, though missing
>>> documentation (to be corrected within the week) mean that there are
>>> still warning messages during check.  The most recent changes are in
>>> svn.
>>>
>>> There is one very recent change, to the overall class structure, that
>>> we agonized over a great deal before making at the last moment.  We
>>> recognize that this is very unfortunate timing, and that it will cause
>>> needless work for bioconductors; we will help out as much as possible.
>>>
>>>
>>> There are three major changes:
>>>
>>> 1. Change in class structure.
>>>
>>> eSet -- VIRTUAL
>>>  ExpressionSet
>>>  SnpSet
>>>  (TilingSet -- not implemented)
>>>
>>> The main functionality of eSet is to coordinate assayData, phenoData,
>>> experimentData, and the annoation.  eSet is also a generalized
>>> container, with high-throughput data stored in the assayData
>>> slot. eSet is a VIRTUAL class; if you want to store and manipulate a
>>> consistent set of elements in the assay data slot you should create a
>>> subclass of eSet. An example of how to do this is below.
>>>
>>> ExpressionSet requires that the assayData slot contain matrix element
>>> 'exprs'; other elements (of dimension identical to exprs) are
>>> permitted. as(exprSet, "ExpressionSet") coerces exprSet objects to
>>> ExpressionSet, perhaps issuing warnings if ambiguities arise.
>>>
>>> library(Biobase)
>>> data(sample.exprSet)
>>> obj <- as(sample.exprSet, "ExpressionSet")
>>> obj
>>>
>>> SnpSet is meant to contain SNP data in a manner analogous to
>>> ExpressionSet; 'call' and 'callProbability' are required assayData
>>> elements providing information on the call and a statement of
>>> confidence in the call. The exact structure of these matricies is not
>>> specified, but the idea is that 'call' encodes diploid genotypes.
>>>
>>> 2. Change in assayData storage
>>>
>>> The assayData slot is an AssayData class union of 'list' and
>>> 'environment'; as a class union, there is no 'initialize'
>>> method. Instead, the list or environment can be populated with
>>> elements using a call to assayDataNew(...).
>>>
>>> An innovation is the storageMode method, which can be used to change
>>> how elements in assayData are stored. In particular the storageMode
>>> can be 'lockedEnvironment', and indeed this is the default. An
>>> environment is locked in the sense that new elements cannot be added
>>> to the environment, and existing elements cannot be changed. This
>>> means that the pass-by-reference semantics of environments will not
>>> catch users off-guard:
>>>
>>> obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>>> storageMode(obj) <- "environment"
>>> obj1 <- obj
>>> exprs(obj1) <- exprs(obj1)[1:10,1:5]
>>> dims(obj) # yikes! obj exprs dimensions changed!
>>>
>>> obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>>> storageMode(obj) <- "environment"
>>> obj1 <- obj
>>> exprs(obj1) <- log(exprs(obj))
>>> identical(exprs(obj1),exprs(obj)) # TRUE: yikes again!
>>>
>>> obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>>> obj1 <- obj
>>> exprs(obj1) <- log(exprs(obj1))
>>> identical(exprs(obj1),exprs(obj)) # FALSE: good!
>>>
>>> Note that attempts to directly change slots in locked environments
>>> cause an error
>>>
>>>
>>>
>>>> assayData(obj1)$exprs <- NULL
>>>>
>>>>
>>> Error: cannot change value of a locked binding.
>>>
>>> The setReplaceMethod for exprs (and assayData) succeeds by performing
>>> a deep copy of the entire environment. Becaue this is very
>>> inefficient, the recommended paradigm to update an element in a
>>> lockedEnvironment is to extract it, make many changes, and then
>>> reassign it, e.g.,
>>>
>>> ex <- exprs(obj1)
>>> # many changes, ex <- log(ex), ...
>>> exprs(obj1) <- ex
>>>
>>> lockedEnvironment offers some efficiency in copying objects, because
>>> the environment is not copied during function calls. This is not
>>> completely satisfactory, though
>>>
>>> func <- function(assayData) # good: contents of env will not be copied
>>>  max(exprs(assayData)) # not so good: exprs copied from environment
>>>
>>>
>>> 3. Changes in other slots
>>>
>>> Other slots have been changed to treat variable metadata more
>>> efficiently (in the AnnotatedDataFrame class of slot phenoData) and to
>>> simplify the type of data stored as experimentData. These changes are
>>> mostly in line with the web discussions.
>>>
>>>
>>>
>>> In making these changes, I have tried not to break the existing
>>> interface beyond what is necessary for the new functionality (e.g.,
>>> pData still returns the 'data' part of phenoData). One difference,
>>> though, is that the methods dim, ncol, etc return a vector of
>>> dimensions reflecting the shared dimensionality of the assayData
>>> memebers; dims returns an array of dimensions of each element.
>>>
>>> These changes affect eSets; any difficulties you might have with
>>> exprSet probably reflect changes made several months ago to validity
>>> checking.
>>>
>>> Please let me know of any feedback,
>>>
>>> Martin
>>> --
>>>
>>> The original 'sample.eSet' contains four elements in the assayData
>>> slot: R, G, Rb, Gb. To derive a class from eSet for this data, create
>>> a class, and provide initializaation and validation
>>> methods. Optionally, update previous eSet data structures to your new
>>> class. For instance,
>>>
>>> setClass("SwirlSet", contains="eSet")
>>>
>>> setMethod("initialize", "SwirlSet",
>>>          function(.Object,
>>>                   phenoData = new("AnnotatedDataFrame"),
>>>                   experimentData = new("MIAME"),
>>>                   annotation = character(),
>>>                   R = new("matrix"),
>>>                   G = new("matrix"),
>>>                   Rb = new("matrix"),
>>>                   Gb = new("matrix"),
>>>                   ... ) {
>>>            callNextMethod(.Object,
>>>                           assayData = assayDataNew(
>>>                             R=R, G=G, Rb=Rb, Gb=Gb,
>>>                             ...),
>>>                           phenoData = phenoData,
>>>                           experimentData = experimentData,
>>>                           annotation = annotation)
>>>          })
>>>
>>> setValidity("SwirlSet", function(object) {
>>>  assayDataValidMembers(assayData(object), c("R", "G", "Rb", "Gb"))
>>> })
>>>
>>> data(sample.eSet)
>>> obj <- updateOldESet(sample.eSet,"SwirlSet")
>>>
>>> _______________________________________________
>>> Bioc-devel at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>



More information about the Bioc-devel mailing list