[Bioc-devel] Biobase / eSet changes for this release
Rafael A. Irizarry
ririzarr at jhsph.edu
Fri Apr 14 03:36:55 CEST 2006
but most applications will have both genotype call and
copy number estimates. so why not just make SnpSet like SnpSetPlus below?
Notice that in our software i will only use SpnSetPlus and so will Rob.
On Thu, 13 Apr 2006, Martin Morgan wrote:
> Hi Rafael,
> An approach is to create a new class derived from eSet with
> initialization and validation methods to specify the additional
> elements to be stored in assayData. For the additions you suggest, the
> following might be a useful template:
> setClass("SnpSetPlus", contains="eSet")
> setMethod("initialize", "SnpSetPlus",
> function(.Object,
> phenoData = new("AnnotatedDataFrame"),
> experimentData = new("MIAME"),
> annotation = character(),
> call = new("matrix"),
> callProbability = new("matrix"),
> copyNumber = new("matrix"),
> copyNumberProbability = new("matrix"),
> ... ) {
> callNextMethod(.Object,
> phenoData = phenoData,
> experimentData = experimentData,
> annotation = annotation,
> call=call,
> callProbability=callProbability,
> copyNumber=copyNumber,
> copyNumberProbability=copyNumberProbability,
> ...)
> })
> setValidity("SnpSetPlus", function(object) {
> assayDataValidMembers(assayData(object),
> c("call","callProbability",
> "copyNumber", "copyNumberProbability"))
> })
> obj <- new("SnpSetPlus")
> Martin
> Rafael A Irizarry <ririzarr at jhsph.edu> writes:
>> Hi!
>> Looks good.
>> One comment:
>> Typically, SnpSet will have more than just the allele calls and p-values.
>> It will aslo have a copy number estimate and it's measure of
>> uncertainty. How hard is it to add these?
>> You would need one more array just like the one you used for the
>> genotype calls.
>> -r
>> Martin Morgan wrote:
>>> Biobase/eSet developers,
>>> Here is a brief summary of the version of eSet to be included in the
>>> this release; the code builds and checks without error, though missing
>>> documentation (to be corrected within the week) mean that there are
>>> still warning messages during check. The most recent changes are in
>>> svn.
>>> There is one very recent change, to the overall class structure, that
>>> we agonized over a great deal before making at the last moment. We
>>> recognize that this is very unfortunate timing, and that it will cause
>>> needless work for bioconductors; we will help out as much as possible.
>>> There are three major changes:
>>> 1. Change in class structure.
>>> eSet -- VIRTUAL
>>> ExpressionSet
>>> SnpSet
>>> (TilingSet -- not implemented)
>>> The main functionality of eSet is to coordinate assayData, phenoData,
>>> experimentData, and the annoation. eSet is also a generalized
>>> container, with high-throughput data stored in the assayData
>>> slot. eSet is a VIRTUAL class; if you want to store and manipulate a
>>> consistent set of elements in the assay data slot you should create a
>>> subclass of eSet. An example of how to do this is below.
>>> ExpressionSet requires that the assayData slot contain matrix element
>>> 'exprs'; other elements (of dimension identical to exprs) are
>>> permitted. as(exprSet, "ExpressionSet") coerces exprSet objects to
>>> ExpressionSet, perhaps issuing warnings if ambiguities arise.
>>> library(Biobase)
>>> data(sample.exprSet)
>>> obj <- as(sample.exprSet, "ExpressionSet")
>>> obj
>>> SnpSet is meant to contain SNP data in a manner analogous to
>>> ExpressionSet; 'call' and 'callProbability' are required assayData
>>> elements providing information on the call and a statement of
>>> confidence in the call. The exact structure of these matricies is not
>>> specified, but the idea is that 'call' encodes diploid genotypes.
>>> 2. Change in assayData storage
>>> The assayData slot is an AssayData class union of 'list' and
>>> 'environment'; as a class union, there is no 'initialize'
>>> method. Instead, the list or environment can be populated with
>>> elements using a call to assayDataNew(...).
>>> An innovation is the storageMode method, which can be used to change
>>> how elements in assayData are stored. In particular the storageMode
>>> can be 'lockedEnvironment', and indeed this is the default. An
>>> environment is locked in the sense that new elements cannot be added
>>> to the environment, and existing elements cannot be changed. This
>>> means that the pass-by-reference semantics of environments will not
>>> catch users off-guard:
>>> obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>>> storageMode(obj) <- "environment"
>>> obj1 <- obj
>>> exprs(obj1) <- exprs(obj1)[1:10,1:5]
>>> dims(obj) # yikes! obj exprs dimensions changed!
>>> obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>>> storageMode(obj) <- "environment"
>>> obj1 <- obj
>>> exprs(obj1) <- log(exprs(obj))
>>> identical(exprs(obj1),exprs(obj)) # TRUE: yikes again!
>>> obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
>>> obj1 <- obj
>>> exprs(obj1) <- log(exprs(obj1))
>>> identical(exprs(obj1),exprs(obj)) # FALSE: good!
>>> Note that attempts to directly change slots in locked environments
>>> cause an error
>>>> assayData(obj1)$exprs <- NULL
>>> Error: cannot change value of a locked binding.
>>> The setReplaceMethod for exprs (and assayData) succeeds by performing
>>> a deep copy of the entire environment. Becaue this is very
>>> inefficient, the recommended paradigm to update an element in a
>>> lockedEnvironment is to extract it, make many changes, and then
>>> reassign it, e.g.,
>>> ex <- exprs(obj1)
>>> # many changes, ex <- log(ex), ...
>>> exprs(obj1) <- ex
>>> lockedEnvironment offers some efficiency in copying objects, because
>>> the environment is not copied during function calls. This is not
>>> completely satisfactory, though
>>> func <- function(assayData) # good: contents of env will not be copied
>>> max(exprs(assayData)) # not so good: exprs copied from environment
>>> 3. Changes in other slots
>>> Other slots have been changed to treat variable metadata more
>>> efficiently (in the AnnotatedDataFrame class of slot phenoData) and to
>>> simplify the type of data stored as experimentData. These changes are
>>> mostly in line with the web discussions.
>>> In making these changes, I have tried not to break the existing
>>> interface beyond what is necessary for the new functionality (e.g.,
>>> pData still returns the 'data' part of phenoData). One difference,
>>> though, is that the methods dim, ncol, etc return a vector of
>>> dimensions reflecting the shared dimensionality of the assayData
>>> memebers; dims returns an array of dimensions of each element.
>>> These changes affect eSets; any difficulties you might have with
>>> exprSet probably reflect changes made several months ago to validity
>>> checking.
>>> Please let me know of any feedback,
>>> Martin
>>> --
>>> The original 'sample.eSet' contains four elements in the assayData
>>> slot: R, G, Rb, Gb. To derive a class from eSet for this data, create
>>> a class, and provide initializaation and validation
>>> methods. Optionally, update previous eSet data structures to your new
>>> class. For instance,
>>> setClass("SwirlSet", contains="eSet")
>>> setMethod("initialize", "SwirlSet",
>>> function(.Object,
>>> phenoData = new("AnnotatedDataFrame"),
>>> experimentData = new("MIAME"),
>>> annotation = character(),
>>> R = new("matrix"),
>>> G = new("matrix"),
>>> Rb = new("matrix"),
>>> Gb = new("matrix"),
>>> ... ) {
>>> callNextMethod(.Object,
>>> assayData = assayDataNew(
>>> R=R, G=G, Rb=Rb, Gb=Gb,
>>> ...),
>>> phenoData = phenoData,
>>> experimentData = experimentData,
>>> annotation = annotation)
>>> })
>>> setValidity("SwirlSet", function(object) {
>>> assayDataValidMembers(assayData(object), c("R", "G", "Rb", "Gb"))
>>> })
>>> data(sample.eSet)
>>> obj <- updateOldESet(sample.eSet,"SwirlSet")
>>> _______________________________________________
>>> Bioc-devel at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list