[BioC] RE : designing an eSet derived object

Wolfgang RAFFELSBERGER wraff at igbmc.fr
Wed Nov 24 14:47:31 CET 2010


Dear Martin,

thank's again - I've got things working as you explained.

Just to make sure I completely understood: 
Now everything is streamlined for the storage of the multiple ExperssionSets for the various methods employed (the 1st slot in my GxSet).
The next step is then to review how I'm storing the "derived" data (eg averages, SEM,...   for each of the methods from above).  Here I've tried a few things, but as far as I understand, there is no already existing class close enough to my case (ideally a "SimpleListList" = list of SimpleLists). So I made a new class containing multiple SimpleList objects (code below) :

setClass("GxAvData",representation(avSI="SimpleList",expressed="SimpleList",SEM="SimpleList", 
   FC="SimpleList",FiltFin="SimpleList",FiltSI="SimpleList",FiltOther="SimpleList"))     

I've also tried to use the SimpleMatrixList object since all my (final) data are nothing but matrixes, but I didn't get this working. Does this matter much ?
Or should I rather define a general "SimpleListList" (list of SimpleLists) first, to decline my specific class ("GxAvData") of this ?


Thanks for all your helpful comments,

Wolfgang

PS: Hope you had a good travel back to the US.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
IGBMC,
1 rue Laurent Fries,  67404 Illkirch  Strasbourg,  France
Tel (+33) 388 65 3300         Fax (+33) 388 65 3276
wolfgang.raffelsberger (at ) igbmc.fr

________________________________________
De : bioconductor-bounces at stat.math.ethz.ch [bioconductor-bounces at stat.math.ethz.ch] de la part de Martin Morgan [mtmorgan at fhcrc.org]
Date d'envoi : lundi 22 novembre 2010 19:42
À : Wolfgang RAFFELSBERGER
Cc : bioconductor at stat.math.ethz.ch
Objet : Re: [BioC] RE :  designing an eSet derived object

Hi Wolfgang --

On 11/22/2010 03:44 AM, Wolfgang RAFFELSBERGER wrote:
> Dear Martin,
>
> thank you very much for your helpful input. I'm sorry I have to bug
> you again.

> I was about there, but at the recent Bioconductor Developer Meeting I
> got another intersting suggestion, which I haven't succeded
> implementing.

> Briefly, (if I understood right) the idea was rather to make a
> modified SimpleList class where I could check that each elment is an
> expression set  (instead of using the SimpleList class as is). From
> there one might even go one step further and check if all dimensions
> are identical, too ...
>
> For the making the modified SimpleList I returned to the help
> provided in the Bioconductor pdf "Biobase development and the new
> eSet". But it seems I'm not getting the inizialization right.

> My 'problem' is, that I don't want to fix in advance how many
> ExperssionSets will be put in the list (SimpleList), neither what
> their names will be.  This way I hope the object will be
> sufficienltly general to hold results from normalization-methods that
> might become available in the future. Now, this is now quite
> different to the example provided in  "Biobase development and the
> new eSet".
>
> To link to my previous post: This (modified) SimpleList will then be
> used as a slot (allowing to store data normalized by multiple
> methods) of another new class (the "GxSet"), plus in other slots for
> data-derived values (averages, etc) and more documentation/notes)...
>
> Thank's in advance fro any hints, Wolfgang

>
>
>>
>> require(Biobase); require(IRanges); require(affy) # the toy data
>> eset1 <- new("ExpressionSet", exprs=matrix(1,10,4)) pData(eset1) <-
>> data.frame("class"=c(1,2,2,2))
>>
>> eset2 <- new("ExpressionSet", exprs=matrix(3,10,4)) pData(eset2) <-
>> data.frame("class"=c(1,2,2,2))
>>
>> # making the modified class
>> setClass("GxSimpleList",contains="SimpleList")

I think the idea is

setClass("SimpleExpressionSetList", contains="SimpleList",
    prototype=prototype(elementType="ExpressionSet"))

and then you're done...

> listData1 <- list(A=new("ExpressionSet"), B=new("ExpressionSet"))
> listData2 <- list(A=new("ExpressionSet"), B=matrix())
> new("SimpleExpressionSetList", listData=listData1)
SimpleExpressionSetList of length 2
names(2): A B
> new("SimpleExpressionSetList", listData=listData2)
Error in validObject(.Object) :
  invalid class "SimpleExpressionSetList" object: the 'listData' slot
must be a list containing ExpressionSet objects
>

> [1] "GxSimpleList"
>> getClass("GxSimpleList")
> Class "GxSimpleList" [in ".GlobalEnv"]
>
> Slots:
>
> Name:         listData elementMetadata     elementType
> metadata Class:            list             ANY       character
> list
>
> Extends: Class "SimpleList", directly Class "Sequence", by class
> "SimpleList", distance 2 Class "Annotated", by class "SimpleList",
> distance 3
>>
>> # for the "initialize" I didn't understand how to formulate it in
>> my case (as I don't know how many elements, neither their names)
>> setMethod("initialize","GxSimpleList", function(.object,...)
>> listData = listDataNew(lapply(list(.object,...) == "ExpressionSet")
>> ))
> Error in conformMethod(signature, mnames, fnames, f, fdef,
> definition) : in method for ‘initialize’ with signature
> ‘.Object="GxSimpleList"’: formal arguments (.Object = "GxSimpleList",
> ... = "GxSimpleList") omitted in the method definition cannot be in
> the signature
>>
>> setMethod("initialize","GxSimpleList", function(.object,...)
>> {.object <- callNextMethod(.object,...)})
> Error in conformMethod(signature, mnames, fnames, f, fdef,
> definition) : in method for ‘initialize’ with signature
> ‘.Object="GxSimpleList"’: formal arguments (.Object = "GxSimpleList",
> ... = "GxSimpleList") omitted in the method definition cannot be in
> the signature
>>
>> # I guess the check for experssionSets should go into validity
>> setValidity("GxSimpleList", function(object) {   # experimetal
> +    if(sum(!(unlist(lapply(object,function(x) class(x))) %in%
> "ExpressionSet")) >0) "A 'GxSimpleList' object should contain
> elements of class 'ExpressionSet' only !" +    #same as ?#
> assayDataValidMembers(class(object),
> rep("ExpressionSet",length(object))) +    }) Class "GxSimpleList" [in
> ".GlobalEnv"]
>
> Slots:
>
> Name:         listData elementMetadata     elementType
> metadata Class:            list             ANY       character
> list
>
> Extends: Class "SimpleList", directly Class "Sequence", by class
> "SimpleList", distance 2 Class "Annotated", by class "SimpleList",
> distance 3
>>
>> # what happens .. lst1 = SimpleList(a=eset1, b=eset2)   # OK
>>
>> lst2 = new("GxSimpleList",a=eset1, b=eset2)  # error (due to
>> missing "initialize" ?)
> Error in initialize(value, ...) : invalid names for slots of class
> "GxSimpleList": a, b
>> lst3 = GxSimpleList(a=eset1, b=eset2)        # error (due to
>> missing "initialize" ?)
> Error: could not find function "GxSimpleList"
>>
>> # for completeness ... sessionInfo()
> R version 2.12.0 (2010-10-15) Platform: i386-pc-mingw32/i386
> (32-bit)
>
> locale: [1] LC_COLLATE=French_France.1252
> LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
> LC_NUMERIC=C [5] LC_TIME=French_France.1252
>
> attached base packages: [1] grDevices datasets  splines   graphics
> stats     tcltk     utils     methods   base
>
> other attached packages: [1] affy_1.28.0     IRanges_1.8.0
> Biobase_2.10.0  svSocket_0.9-50 TinnR_1.0.3     R2HTML_2.2
> Hmisc_3.8-3     survival_2.35-8
>
> loaded via a namespace (and not attached): [1] affyio_1.18.0
> cluster_1.13.1        grid_2.12.0           lattice_0.19-13
> preprocessCore_1.12.0 svMisc_0.9-60 [7] tools_2.12.0
>>
>
>
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et
> Génomique Intégratives IGBMC, 1 rue Laurent Fries,  67404 Illkirch
> Strasbourg,  France Tel (+33) 388 65 3300         Fax (+33) 388 65
> 3276 wolfgang.raffelsberger (at ) igbmc.fr
>
> ________________________________________ De : Martin Morgan
> [mtmorgan at fhcrc.org] Date d'envoi : vendredi 5 novembre 2010 18:33 À
> : Wolfgang RAFFELSBERGER Cc : bioconductor at stat.math.ethz.ch Objet :
> Re: [BioC] designing an eSet derived object
>
> On 11/05/2010 05:02 AM, Wolfgang RAFFELSBERGER wrote:
>> Dear list,
>>
>
>> basically I'm trying to design an object to contain the following
>> microarray-data 1) "gxIndData": microarray-data normalized in
>> parallel by (an array-dependent) number of n methods plus the
>> corresponding expression-calls (again, <= n methods), 2)
>> "gxAvData": derived values (replicate-averages, SEMs, etc), 3)
>> gene/spot annotation, 4) sample-description, 5) various supl
>> informations (parameters, notes, versions, etc)
>>
>> In overall, this is a somehow modified/extended concept to the
>> Biobase eSet and I'm trying to figure out if there is a way to use
>> the Biobase eSet. This way I hope to maintain a decent level of
>> compatibility with other Bioconductor methods and allow
>> code-reuse.
>>
>> Now I'd like to store  the various sections of 1) and 2) as
>> separate lists with n matrixes of values to keep things organized.
>>
>> According to the Vignette "Biobase development and the new eSet"
>> section 5 ("Extending eSet"), I defined new a new class 'eSet'.
>> But as soon as I integrate something different than matrixes at the
>> level of 'AssayData', I get an error-message (see code below) - no
>> matter if these are simply lists or custom-objects. I suppose this
>> means that I would have to store all matrixes (up to 10*6methods
>> =60 matrixes) without further organization at the level of
>> 'AssayData'.
>
> eSet requires that all AssayData elements are two-dimensional with
> identical dimensions, so a list-of-matrices would not work.
>
>> However, I'd like to keep at least one (in my case better 2)
>> levels of additional arborescence to keep the data organized.
>>
>> So, finally I would like to integrate two new classes for 1) and
>> 2) at the level of the assayData slot of my modified/new eSet.
>>
>> Does this mean this is not possible and that I cannot use the
>> 'eSet' for my purposes ? Do I have to create a novel class somehow
>> equivalent but finally incompatible to the 'eSet' ?
>>
>> Any suggestions/hints ?
>
> One possiblity, if this is for your own use and not as the
> foundation for a package, is to use NChannelSet, where each method is
> a 'channel'.
>
> Another possibility is to create a class that extends eSet with a
> slot containing, e.g., an AnnotatedDataFrame with columns describing
> the AssayData, and a method to query the slot / select the
> appropriate assayData elements
>
> And perhaps what you really have is more a list of (of lists of)
> ExpressionSets, each element of the list with additional information.
> An approach here would use the IRanges 'SimpleList' infrastructure,
> e.g.,
>
>> lst = SimpleList(a=new("ExpressionSet"), b=new("ExpressionSet"))
>> elementMetadata(lst) = DataFrame(method=c("A", "B"))
>> lst[elementMetadata(lst)$method == "A"]
> SimpleList of length 1 names(1): a
>> lst[elementMetadata(lst)$method == "A"][[1]]
> ExpressionSet (storageMode: lockedEnvironment) assayData: 0 features,
> 0 samples element names: exprs protocolData: none phenoData: none
> featureData: none experimentData: use 'experimentData(object)'
> Annotation:
>
> Martin
>
>>
>> Thank’s in advance, wolfgang
>>
>> ##
>>
>> require(Biobase) setClass("gxSet", contains = "eSet")
>> setMethod("initialize", "gxSet", function(.Object,
>> A=new("list"),B=new("list"),...) { callNextMethod(.Object, A=A,B=B,
>> ...) }) new("gxSet") ## produces : Error in function (storage.mode
>> = c("lockedEnvironment", "environment",  : 'AssayData' elements
>> with invalid dimensions: 'A' 'B'
>>
>>
>> ## ideally I'd like to use
>> setClass("gxIndData",representation(SIdata="list",SIcall="list"))
>> setClass("gxAvData",representation(avSI="list",expressed="list",SEM="list",
>> conCall="list",
>> FC="list",FiltFin="list",FiltSI="list",FiltOther="list"))
>> setClass("gxSet", contains = "eSet")
>>
>> setMethod("initialize","gxSet", function(.Object,
>> assayData=assayDataNew(IndData=IndData,AvData=AvData),
>> IndData=new("gxIndData"), AvData=new("gxAvData"),...) {
>> if(!missing(assayData) && any(!missing(IndData), !missing(AvData)))
>> { warning("using 'assayData'; ignoring 'IndData', 'AvData'") }
>> callNextMethod(.Object, assayData = assayData, ...) })
>>
>> new("gxSet") ## produces : Error in assayDataNew(IndData = IndData,
>> AvData = AvData) : 'AssayData' elements with invalid dimensions:
>> 'AvData' 'IndData'
>>
>>
>> ## the alternative : an eSet 'like' but independent and
>> incompatible object ..
>> setClass("gxSet",representation(IndData="gxIndData",AvData="gxAvData",phenoData="AnnotatedDataFrame",featureData="AnnotatedDataFrame",
>>
>>
experimentData="MIAME",annotation="character",protocolData="AnnotatedDataFrame",notes="list"))
>>
>>
>>
>> ## for completeness: sessionInfo() R version 2.12.0 (2010-10-15)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale: [1] LC_COLLATE=French_France.1252
>> LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 [4]
>> LC_NUMERIC=C                   LC_TIME=French_France.1252
>>
>> attached base packages: [1] grDevices datasets  splines   graphics
>> stats     tcltk     utils     methods   base
>>
>> other attached packages: [1] affy_1.28.0     Biobase_2.10.0
>> svSocket_0.9-50 TinnR_1.0.3     R2HTML_2.2      Hmisc_3.8-3
>> survival_2.35-8
>>
>> loaded via a namespace (and not attached): [1] affyio_1.18.0
>> cluster_1.13.1        grid_2.12.0           lattice_0.19-13
>> preprocessCore_1.12.0 [6] svMisc_0.9-60         tools_2.12.0
>>
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et
>> Génomique Intégratives IGBMC, 1 rue Laurent Fries,  67404 Illkirch
>> Strasbourg,  France Tel (+33) 388 65 3300         Fax (+33) 388 65
>> 3276 wolfgang.raffelsberger @ igbmc.fr
>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>>
>> _______________________________________________ Bioconductor
>> mailing list Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> -- Computational Biology Fred Hutchinson Cancer Research Center 1100
> Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861 Telephone: 206 667-2793


--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list