[BioC] RE : designing an eSet derived object

Martin Morgan mtmorgan at fhcrc.org
Mon Nov 22 19:42:12 CET 2010


Hi Wolfgang --

On 11/22/2010 03:44 AM, Wolfgang RAFFELSBERGER wrote:
> Dear Martin,
> 
> thank you very much for your helpful input. I'm sorry I have to bug
> you again.

> I was about there, but at the recent Bioconductor Developer Meeting I
> got another intersting suggestion, which I haven't succeded
> implementing.

> Briefly, (if I understood right) the idea was rather to make a
> modified SimpleList class where I could check that each elment is an
> expression set  (instead of using the SimpleList class as is). From
> there one might even go one step further and check if all dimensions
> are identical, too ...
> 
> For the making the modified SimpleList I returned to the help
> provided in the Bioconductor pdf "Biobase development and the new
> eSet". But it seems I'm not getting the inizialization right.

> My 'problem' is, that I don't want to fix in advance how many
> ExperssionSets will be put in the list (SimpleList), neither what
> their names will be.  This way I hope the object will be
> sufficienltly general to hold results from normalization-methods that
> might become available in the future. Now, this is now quite
> different to the example provided in  "Biobase development and the
> new eSet".
> 
> To link to my previous post: This (modified) SimpleList will then be
> used as a slot (allowing to store data normalized by multiple
> methods) of another new class (the "GxSet"), plus in other slots for
> data-derived values (averages, etc) and more documentation/notes)...
> 
> Thank's in advance fro any hints, Wolfgang

> 
> 
>> 
>> require(Biobase); require(IRanges); require(affy) # the toy data 
>> eset1 <- new("ExpressionSet", exprs=matrix(1,10,4)) pData(eset1) <-
>> data.frame("class"=c(1,2,2,2))
>> 
>> eset2 <- new("ExpressionSet", exprs=matrix(3,10,4)) pData(eset2) <-
>> data.frame("class"=c(1,2,2,2))
>> 
>> # making the modified class 
>> setClass("GxSimpleList",contains="SimpleList")

I think the idea is

setClass("SimpleExpressionSetList", contains="SimpleList",
    prototype=prototype(elementType="ExpressionSet"))

and then you're done...

> listData1 <- list(A=new("ExpressionSet"), B=new("ExpressionSet"))
> listData2 <- list(A=new("ExpressionSet"), B=matrix())
> new("SimpleExpressionSetList", listData=listData1)
SimpleExpressionSetList of length 2
names(2): A B
> new("SimpleExpressionSetList", listData=listData2)
Error in validObject(.Object) :
  invalid class "SimpleExpressionSetList" object: the 'listData' slot
must be a list containing ExpressionSet objects
>

> [1] "GxSimpleList"
>> getClass("GxSimpleList")
> Class "GxSimpleList" [in ".GlobalEnv"]
> 
> Slots:
> 
> Name:         listData elementMetadata     elementType
> metadata Class:            list             ANY       character
> list
> 
> Extends: Class "SimpleList", directly Class "Sequence", by class
> "SimpleList", distance 2 Class "Annotated", by class "SimpleList",
> distance 3
>> 
>> # for the "initialize" I didn't understand how to formulate it in
>> my case (as I don't know how many elements, neither their names) 
>> setMethod("initialize","GxSimpleList", function(.object,...)
>> listData = listDataNew(lapply(list(.object,...) == "ExpressionSet")
>> ))
> Error in conformMethod(signature, mnames, fnames, f, fdef,
> definition) : in method for ‘initialize’ with signature
> ‘.Object="GxSimpleList"’: formal arguments (.Object = "GxSimpleList",
> ... = "GxSimpleList") omitted in the method definition cannot be in
> the signature
>> 
>> setMethod("initialize","GxSimpleList", function(.object,...)
>> {.object <- callNextMethod(.object,...)})
> Error in conformMethod(signature, mnames, fnames, f, fdef,
> definition) : in method for ‘initialize’ with signature
> ‘.Object="GxSimpleList"’: formal arguments (.Object = "GxSimpleList",
> ... = "GxSimpleList") omitted in the method definition cannot be in
> the signature
>> 
>> # I guess the check for experssionSets should go into validity 
>> setValidity("GxSimpleList", function(object) {   # experimetal
> +    if(sum(!(unlist(lapply(object,function(x) class(x))) %in%
> "ExpressionSet")) >0) "A 'GxSimpleList' object should contain
> elements of class 'ExpressionSet' only !" +    #same as ?#
> assayDataValidMembers(class(object),
> rep("ExpressionSet",length(object))) +    }) Class "GxSimpleList" [in
> ".GlobalEnv"]
> 
> Slots:
> 
> Name:         listData elementMetadata     elementType
> metadata Class:            list             ANY       character
> list
> 
> Extends: Class "SimpleList", directly Class "Sequence", by class
> "SimpleList", distance 2 Class "Annotated", by class "SimpleList",
> distance 3
>> 
>> # what happens .. lst1 = SimpleList(a=eset1, b=eset2)   # OK
>> 
>> lst2 = new("GxSimpleList",a=eset1, b=eset2)  # error (due to
>> missing "initialize" ?)
> Error in initialize(value, ...) : invalid names for slots of class
> "GxSimpleList": a, b
>> lst3 = GxSimpleList(a=eset1, b=eset2)        # error (due to
>> missing "initialize" ?)
> Error: could not find function "GxSimpleList"
>> 
>> # for completeness ... sessionInfo()
> R version 2.12.0 (2010-10-15) Platform: i386-pc-mingw32/i386
> (32-bit)
> 
> locale: [1] LC_COLLATE=French_France.1252
> LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
> LC_NUMERIC=C [5] LC_TIME=French_France.1252
> 
> attached base packages: [1] grDevices datasets  splines   graphics
> stats     tcltk     utils     methods   base
> 
> other attached packages: [1] affy_1.28.0     IRanges_1.8.0
> Biobase_2.10.0  svSocket_0.9-50 TinnR_1.0.3     R2HTML_2.2
> Hmisc_3.8-3     survival_2.35-8
> 
> loaded via a namespace (and not attached): [1] affyio_1.18.0
> cluster_1.13.1        grid_2.12.0           lattice_0.19-13
> preprocessCore_1.12.0 svMisc_0.9-60 [7] tools_2.12.0
>> 
> 
> 
> 
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et
> Génomique Intégratives IGBMC, 1 rue Laurent Fries,  67404 Illkirch
> Strasbourg,  France Tel (+33) 388 65 3300         Fax (+33) 388 65
> 3276 wolfgang.raffelsberger (at ) igbmc.fr
> 
> ________________________________________ De : Martin Morgan
> [mtmorgan at fhcrc.org] Date d'envoi : vendredi 5 novembre 2010 18:33 À
> : Wolfgang RAFFELSBERGER Cc : bioconductor at stat.math.ethz.ch Objet :
> Re: [BioC] designing an eSet derived object
> 
> On 11/05/2010 05:02 AM, Wolfgang RAFFELSBERGER wrote:
>> Dear list,
>> 
> 
>> basically I'm trying to design an object to contain the following 
>> microarray-data 1) "gxIndData": microarray-data normalized in
>> parallel by (an array-dependent) number of n methods plus the
>> corresponding expression-calls (again, <= n methods), 2)
>> "gxAvData": derived values (replicate-averages, SEMs, etc), 3)
>> gene/spot annotation, 4) sample-description, 5) various supl
>> informations (parameters, notes, versions, etc)
>> 
>> In overall, this is a somehow modified/extended concept to the 
>> Biobase eSet and I'm trying to figure out if there is a way to use 
>> the Biobase eSet. This way I hope to maintain a decent level of 
>> compatibility with other Bioconductor methods and allow
>> code-reuse.
>> 
>> Now I'd like to store  the various sections of 1) and 2) as
>> separate lists with n matrixes of values to keep things organized.
>> 
>> According to the Vignette "Biobase development and the new eSet" 
>> section 5 ("Extending eSet"), I defined new a new class 'eSet'.
>> But as soon as I integrate something different than matrixes at the
>> level of 'AssayData', I get an error-message (see code below) - no
>> matter if these are simply lists or custom-objects. I suppose this
>> means that I would have to store all matrixes (up to 10*6methods
>> =60 matrixes) without further organization at the level of
>> 'AssayData'.
> 
> eSet requires that all AssayData elements are two-dimensional with 
> identical dimensions, so a list-of-matrices would not work.
> 
>> However, I'd like to keep at least one (in my case better 2)
>> levels of additional arborescence to keep the data organized.
>> 
>> So, finally I would like to integrate two new classes for 1) and
>> 2) at the level of the assayData slot of my modified/new eSet.
>> 
>> Does this mean this is not possible and that I cannot use the
>> 'eSet' for my purposes ? Do I have to create a novel class somehow 
>> equivalent but finally incompatible to the 'eSet' ?
>> 
>> Any suggestions/hints ?
> 
> One possiblity, if this is for your own use and not as the
> foundation for a package, is to use NChannelSet, where each method is
> a 'channel'.
> 
> Another possibility is to create a class that extends eSet with a
> slot containing, e.g., an AnnotatedDataFrame with columns describing
> the AssayData, and a method to query the slot / select the
> appropriate assayData elements
> 
> And perhaps what you really have is more a list of (of lists of) 
> ExpressionSets, each element of the list with additional information.
> An approach here would use the IRanges 'SimpleList' infrastructure,
> e.g.,
> 
>> lst = SimpleList(a=new("ExpressionSet"), b=new("ExpressionSet")) 
>> elementMetadata(lst) = DataFrame(method=c("A", "B")) 
>> lst[elementMetadata(lst)$method == "A"]
> SimpleList of length 1 names(1): a
>> lst[elementMetadata(lst)$method == "A"][[1]]
> ExpressionSet (storageMode: lockedEnvironment) assayData: 0 features,
> 0 samples element names: exprs protocolData: none phenoData: none 
> featureData: none experimentData: use 'experimentData(object)' 
> Annotation:
> 
> Martin
> 
>> 
>> Thank’s in advance, wolfgang
>> 
>> ##
>> 
>> require(Biobase) setClass("gxSet", contains = "eSet") 
>> setMethod("initialize", "gxSet", function(.Object,
>> A=new("list"),B=new("list"),...) { callNextMethod(.Object, A=A,B=B,
>> ...) }) new("gxSet") ## produces : Error in function (storage.mode
>> = c("lockedEnvironment", "environment",  : 'AssayData' elements
>> with invalid dimensions: 'A' 'B'
>> 
>> 
>> ## ideally I'd like to use 
>> setClass("gxIndData",representation(SIdata="list",SIcall="list")) 
>> setClass("gxAvData",representation(avSI="list",expressed="list",SEM="list",
>> conCall="list", 
>> FC="list",FiltFin="list",FiltSI="list",FiltOther="list")) 
>> setClass("gxSet", contains = "eSet")
>> 
>> setMethod("initialize","gxSet", function(.Object, 
>> assayData=assayDataNew(IndData=IndData,AvData=AvData), 
>> IndData=new("gxIndData"), AvData=new("gxAvData"),...) { 
>> if(!missing(assayData) && any(!missing(IndData), !missing(AvData)))
>> { warning("using 'assayData'; ignoring 'IndData', 'AvData'") } 
>> callNextMethod(.Object, assayData = assayData, ...) })
>> 
>> new("gxSet") ## produces : Error in assayDataNew(IndData = IndData,
>> AvData = AvData) : 'AssayData' elements with invalid dimensions:
>> 'AvData' 'IndData'
>> 
>> 
>> ## the alternative : an eSet 'like' but independent and
>> incompatible object .. 
>> setClass("gxSet",representation(IndData="gxIndData",AvData="gxAvData",phenoData="AnnotatedDataFrame",featureData="AnnotatedDataFrame",
>>
>> 
experimentData="MIAME",annotation="character",protocolData="AnnotatedDataFrame",notes="list"))
>> 
>> 
>> 
>> ## for completeness: sessionInfo() R version 2.12.0 (2010-10-15) 
>> Platform: i386-pc-mingw32/i386 (32-bit)
>> 
>> locale: [1] LC_COLLATE=French_France.1252
>> LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 [4]
>> LC_NUMERIC=C                   LC_TIME=French_France.1252
>> 
>> attached base packages: [1] grDevices datasets  splines   graphics
>> stats     tcltk     utils     methods   base
>> 
>> other attached packages: [1] affy_1.28.0     Biobase_2.10.0
>> svSocket_0.9-50 TinnR_1.0.3     R2HTML_2.2      Hmisc_3.8-3
>> survival_2.35-8
>> 
>> loaded via a namespace (and not attached): [1] affyio_1.18.0
>> cluster_1.13.1        grid_2.12.0           lattice_0.19-13
>> preprocessCore_1.12.0 [6] svMisc_0.9-60         tools_2.12.0
>> 
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et
>> Génomique Intégratives IGBMC, 1 rue Laurent Fries,  67404 Illkirch
>> Strasbourg,  France Tel (+33) 388 65 3300         Fax (+33) 388 65
>> 3276 wolfgang.raffelsberger @ igbmc.fr
>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>> 
>> 
>> 
>> _______________________________________________ Bioconductor
>> mailing list Bioconductor at stat.math.ethz.ch 
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> -- Computational Biology Fred Hutchinson Cancer Research Center 1100
> Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> 
> Location: M1-B861 Telephone: 206 667-2793


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list