[BioC] RE : designing an eSet derived object
Wolfgang RAFFELSBERGER
wraff at igbmc.fr
Mon Nov 22 12:44:37 CET 2010
Dear Martin,
thank you very much for your helpful input. I'm sorry I have to bug you again.
I was about there, but at the recent Bioconductor Developer Meeting I got another intersting suggestion, which I haven't succeded implementing.
Briefly, (if I understood right) the idea was rather to make a modified SimpleList class where I could check that each elment is an expression set (instead of using the SimpleList class as is). From there one might even go one step further and check if all dimensions are identical, too ...
For the making the modified SimpleList I returned to the help provided in the Bioconductor pdf "Biobase development and the new eSet". But it seems I'm not getting the inizialization right.
My 'problem' is, that I don't want to fix in advance how many ExperssionSets will be put in the list (SimpleList), neither what their names will be. This way I hope the object will be sufficienltly general to hold results from normalization-methods that might become available in the future. Now, this is now quite different to the example provided in "Biobase development and the new eSet".
To link to my previous post: This (modified) SimpleList will then be used as a slot (allowing to store data normalized by multiple methods) of another new class (the "GxSet"), plus in other slots for data-derived values (averages, etc) and more documentation/notes)...
Thank's in advance fro any hints,
Wolfgang
>
> require(Biobase); require(IRanges); require(affy)
> # the toy data
> eset1 <- new("ExpressionSet", exprs=matrix(1,10,4))
> pData(eset1) <- data.frame("class"=c(1,2,2,2))
>
> eset2 <- new("ExpressionSet", exprs=matrix(3,10,4))
> pData(eset2) <- data.frame("class"=c(1,2,2,2))
>
> # making the modified class
> setClass("GxSimpleList",contains="SimpleList")
[1] "GxSimpleList"
> getClass("GxSimpleList")
Class "GxSimpleList" [in ".GlobalEnv"]
Slots:
Name: listData elementMetadata elementType metadata
Class: list ANY character list
Extends:
Class "SimpleList", directly
Class "Sequence", by class "SimpleList", distance 2
Class "Annotated", by class "SimpleList", distance 3
>
> # for the "initialize" I didn't understand how to formulate it in my case (as I don't know how many elements, neither their names)
> setMethod("initialize","GxSimpleList", function(.object,...) listData = listDataNew(lapply(list(.object,...) == "ExpressionSet") ))
Error in conformMethod(signature, mnames, fnames, f, fdef, definition) :
in method for ‘initialize’ with signature ‘.Object="GxSimpleList"’: formal arguments (.Object = "GxSimpleList", ... = "GxSimpleList") omitted in the method definition cannot be in the signature
>
> setMethod("initialize","GxSimpleList", function(.object,...) {.object <- callNextMethod(.object,...)})
Error in conformMethod(signature, mnames, fnames, f, fdef, definition) :
in method for ‘initialize’ with signature ‘.Object="GxSimpleList"’: formal arguments (.Object = "GxSimpleList", ... = "GxSimpleList") omitted in the method definition cannot be in the signature
>
> # I guess the check for experssionSets should go into validity
> setValidity("GxSimpleList", function(object) { # experimetal
+ if(sum(!(unlist(lapply(object,function(x) class(x))) %in% "ExpressionSet")) >0) "A 'GxSimpleList' object should contain elements of class 'ExpressionSet' only !"
+ #same as ?# assayDataValidMembers(class(object), rep("ExpressionSet",length(object)))
+ })
Class "GxSimpleList" [in ".GlobalEnv"]
Slots:
Name: listData elementMetadata elementType metadata
Class: list ANY character list
Extends:
Class "SimpleList", directly
Class "Sequence", by class "SimpleList", distance 2
Class "Annotated", by class "SimpleList", distance 3
>
> # what happens ..
> lst1 = SimpleList(a=eset1, b=eset2) # OK
>
> lst2 = new("GxSimpleList",a=eset1, b=eset2) # error (due to missing "initialize" ?)
Error in initialize(value, ...) :
invalid names for slots of class "GxSimpleList": a, b
> lst3 = GxSimpleList(a=eset1, b=eset2) # error (due to missing "initialize" ?)
Error: could not find function "GxSimpleList"
>
> # for completeness ...
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] grDevices datasets splines graphics stats tcltk utils methods base
other attached packages:
[1] affy_1.28.0 IRanges_1.8.0 Biobase_2.10.0 svSocket_0.9-50 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 survival_2.35-8
loaded via a namespace (and not attached):
[1] affyio_1.18.0 cluster_1.13.1 grid_2.12.0 lattice_0.19-13 preprocessCore_1.12.0 svMisc_0.9-60
[7] tools_2.12.0
>
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
IGBMC,
1 rue Laurent Fries, 67404 Illkirch Strasbourg, France
Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
wolfgang.raffelsberger (at ) igbmc.fr
________________________________________
De : Martin Morgan [mtmorgan at fhcrc.org]
Date d'envoi : vendredi 5 novembre 2010 18:33
À : Wolfgang RAFFELSBERGER
Cc : bioconductor at stat.math.ethz.ch
Objet : Re: [BioC] designing an eSet derived object
On 11/05/2010 05:02 AM, Wolfgang RAFFELSBERGER wrote:
> Dear list,
>
> basically I'm trying to design an object to contain the following
> microarray-data
> 1) "gxIndData": microarray-data normalized in parallel by (an
> array-dependent) number of n methods plus the corresponding
> expression-calls (again, <= n methods),
> 2) "gxAvData": derived values (replicate-averages, SEMs, etc),
> 3) gene/spot annotation,
> 4) sample-description,
> 5) various supl informations (parameters, notes, versions, etc)
>
> In overall, this is a somehow modified/extended concept to the
> Biobase eSet and I'm trying to figure out if there is a way to use
> the Biobase eSet. This way I hope to maintain a decent level of
> compatibility with other Bioconductor methods and allow code-reuse.
>
> Now I'd like to store the various sections of 1) and 2) as separate
> lists with n matrixes of values to keep things organized.
>
> According to the Vignette "Biobase development and the new eSet"
> section 5 ("Extending eSet"), I defined new a new class 'eSet'. But
> as soon as I integrate something different than matrixes at the level
> of 'AssayData', I get an error-message (see code below) - no matter
> if these are simply lists or custom-objects. I suppose this means
> that I would have to store all matrixes (up to 10*6methods =60
> matrixes) without further organization at the level of 'AssayData'.
eSet requires that all AssayData elements are two-dimensional with
identical dimensions, so a list-of-matrices would not work.
> However, I'd like to keep at least one (in my case better 2) levels
> of additional arborescence to keep the data organized.
>
> So, finally I would like to integrate two new classes for 1) and 2)
> at the level of the assayData slot of my modified/new eSet.
>
> Does this mean this is not possible and that I cannot use the 'eSet'
> for my purposes ? Do I have to create a novel class somehow
> equivalent but finally incompatible to the 'eSet' ?
>
> Any suggestions/hints ?
One possiblity, if this is for your own use and not as the foundation
for a package, is to use NChannelSet, where each method is a 'channel'.
Another possibility is to create a class that extends eSet with a slot
containing, e.g., an AnnotatedDataFrame with columns describing the
AssayData, and a method to query the slot / select the appropriate
assayData elements
And perhaps what you really have is more a list of (of lists of)
ExpressionSets, each element of the list with additional information. An
approach here would use the IRanges 'SimpleList' infrastructure, e.g.,
> lst = SimpleList(a=new("ExpressionSet"), b=new("ExpressionSet"))
> elementMetadata(lst) = DataFrame(method=c("A", "B"))
> lst[elementMetadata(lst)$method == "A"]
SimpleList of length 1
names(1): a
> lst[elementMetadata(lst)$method == "A"][[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 0 features, 0 samples
element names: exprs
protocolData: none
phenoData: none
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
Martin
>
> Thank’s in advance,
> wolfgang
>
> ##
>
> require(Biobase)
> setClass("gxSet", contains = "eSet")
> setMethod("initialize", "gxSet", function(.Object, A=new("list"),B=new("list"),...) {
> callNextMethod(.Object, A=A,B=B, ...) })
> new("gxSet")
> ## produces :
> Error in function (storage.mode = c("lockedEnvironment", "environment", :
> 'AssayData' elements with invalid dimensions: 'A' 'B'
>
>
> ## ideally I'd like to use
> setClass("gxIndData",representation(SIdata="list",SIcall="list"))
> setClass("gxAvData",representation(avSI="list",expressed="list",SEM="list", conCall="list",
> FC="list",FiltFin="list",FiltSI="list",FiltOther="list"))
> setClass("gxSet", contains = "eSet")
>
> setMethod("initialize","gxSet", function(.Object,
> assayData=assayDataNew(IndData=IndData,AvData=AvData),
> IndData=new("gxIndData"), AvData=new("gxAvData"),...) {
> if(!missing(assayData) && any(!missing(IndData), !missing(AvData))) {
> warning("using 'assayData'; ignoring 'IndData', 'AvData'") }
> callNextMethod(.Object, assayData = assayData, ...)
> })
>
> new("gxSet")
> ## produces :
> Error in assayDataNew(IndData = IndData, AvData = AvData) :
> 'AssayData' elements with invalid dimensions: 'AvData' 'IndData'
>
>
> ## the alternative : an eSet 'like' but independent and incompatible object ..
> setClass("gxSet",representation(IndData="gxIndData",AvData="gxAvData",phenoData="AnnotatedDataFrame",featureData="AnnotatedDataFrame",
> experimentData="MIAME",annotation="character",protocolData="AnnotatedDataFrame",notes="list"))
>
>
>
> ## for completeness:
> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
> [4] LC_NUMERIC=C LC_TIME=French_France.1252
>
> attached base packages:
> [1] grDevices datasets splines graphics stats tcltk utils methods base
>
> other attached packages:
> [1] affy_1.28.0 Biobase_2.10.0 svSocket_0.9-50 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 survival_2.35-8
>
> loaded via a namespace (and not attached):
> [1] affyio_1.18.0 cluster_1.13.1 grid_2.12.0 lattice_0.19-13 preprocessCore_1.12.0
> [6] svMisc_0.9-60 tools_2.12.0
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Wolfgang Raffelsberger, PhD
> Laboratoire de BioInformatique et Génomique Intégratives
> IGBMC,
> 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France
> Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
> wolfgang.raffelsberger @ igbmc.fr
>
>
> [[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list