[BioC] RE : designing an eSet derived object
Martin Morgan
mtmorgan at fhcrc.org
Thu Nov 25 20:31:28 CET 2010
On 11/24/2010 05:47 AM, Wolfgang RAFFELSBERGER wrote:
> Dear Martin,
>
> thank's again - I've got things working as you explained.
>
> Just to make sure I completely understood: Now everything is
> streamlined for the storage of the multiple ExperssionSets for the
> various methods employed (the 1st slot in my GxSet). The next step is
> then to review how I'm storing the "derived" data (eg averages,
> SEM,... for each of the methods from above). Here I've tried a few
> things, but as far as I understand, there is no already existing
> class close enough to my case (ideally a "SimpleListList" = list of
> SimpleLists). So I made a new class containing multiple SimpleList
> objects (code below) :
>
> setClass("GxAvData",representation(avSI="SimpleList",expressed="SimpleList",SEM="SimpleList",
>
> FC="SimpleList",FiltFin="SimpleList",FiltSI="SimpleList",FiltOther="SimpleList"))
>
>
> I've also tried to use the SimpleMatrixList object since all my
> (final) data are nothing but matrixes, but I didn't get this working.
> Does this matter much ? Or should I rather define a general
> "SimpleListList" (list of SimpleLists) first, to decline my specific
> class ("GxAvData") of this ?
It seems like your class has a well-defined number of 'SimpleList'
slots, so your setClass above seems appropriate.
If I
setClass("SimpleMatrixList", contains="SimpleList",
prototype=prototype(elementType="matrix"))
SimpleMatrixList <-
function(...) new("SimpleMatrixList", listData=list(...))
things seem to work?
> mlst <- SimpleMatrixList(a=matrix(0, 5, 5), b=matrix(1, 5, 5))
> mlst[["b"]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 1 1 1 1 1
[3,] 1 1 1 1 1
[4,] 1 1 1 1 1
[5,] 1 1 1 1 1
> mlst <- SimpleMatrixList(c=data.frame())
Error in validObject(.Object) :
invalid class "SimpleMatrixList" object: the 'listData' slot must be a
list containing matrix objects
Martin
>
>
> Thanks for all your helpful comments,
>
> Wolfgang
>
> PS: Hope you had a good travel back to the US.
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et
> Génomique Intégratives IGBMC, 1 rue Laurent Fries, 67404 Illkirch
> Strasbourg, France Tel (+33) 388 65 3300 Fax (+33) 388 65
> 3276 wolfgang.raffelsberger (at ) igbmc.fr
>
> ________________________________________ De :
> bioconductor-bounces at stat.math.ethz.ch
> [bioconductor-bounces at stat.math.ethz.ch] de la part de Martin Morgan
> [mtmorgan at fhcrc.org] Date d'envoi : lundi 22 novembre 2010 19:42 À :
> Wolfgang RAFFELSBERGER Cc : bioconductor at stat.math.ethz.ch Objet :
> Re: [BioC] RE : designing an eSet derived object
>
> Hi Wolfgang --
>
> On 11/22/2010 03:44 AM, Wolfgang RAFFELSBERGER wrote:
>> Dear Martin,
>>
>> thank you very much for your helpful input. I'm sorry I have to
>> bug you again.
>
>> I was about there, but at the recent Bioconductor Developer Meeting
>> I got another intersting suggestion, which I haven't succeded
>> implementing.
>
>> Briefly, (if I understood right) the idea was rather to make a
>> modified SimpleList class where I could check that each elment is
>> an expression set (instead of using the SimpleList class as is).
>> From there one might even go one step further and check if all
>> dimensions are identical, too ...
>>
>> For the making the modified SimpleList I returned to the help
>> provided in the Bioconductor pdf "Biobase development and the new
>> eSet". But it seems I'm not getting the inizialization right.
>
>> My 'problem' is, that I don't want to fix in advance how many
>> ExperssionSets will be put in the list (SimpleList), neither what
>> their names will be. This way I hope the object will be
>> sufficienltly general to hold results from normalization-methods
>> that might become available in the future. Now, this is now quite
>> different to the example provided in "Biobase development and the
>> new eSet".
>>
>> To link to my previous post: This (modified) SimpleList will then
>> be used as a slot (allowing to store data normalized by multiple
>> methods) of another new class (the "GxSet"), plus in other slots
>> for data-derived values (averages, etc) and more
>> documentation/notes)...
>>
>> Thank's in advance fro any hints, Wolfgang
>
>>
>>
>>>
>>> require(Biobase); require(IRanges); require(affy) # the toy data
>>> eset1 <- new("ExpressionSet", exprs=matrix(1,10,4)) pData(eset1)
>>> <- data.frame("class"=c(1,2,2,2))
>>>
>>> eset2 <- new("ExpressionSet", exprs=matrix(3,10,4)) pData(eset2)
>>> <- data.frame("class"=c(1,2,2,2))
>>>
>>> # making the modified class
>>> setClass("GxSimpleList",contains="SimpleList")
>
> I think the idea is
>
> setClass("SimpleExpressionSetList", contains="SimpleList",
> prototype=prototype(elementType="ExpressionSet"))
>
> and then you're done...
>
>> listData1 <- list(A=new("ExpressionSet"), B=new("ExpressionSet"))
>> listData2 <- list(A=new("ExpressionSet"), B=matrix())
>> new("SimpleExpressionSetList", listData=listData1)
> SimpleExpressionSetList of length 2 names(2): A B
>> new("SimpleExpressionSetList", listData=listData2)
> Error in validObject(.Object) : invalid class
> "SimpleExpressionSetList" object: the 'listData' slot must be a list
> containing ExpressionSet objects
>>
>
>> [1] "GxSimpleList"
>>> getClass("GxSimpleList")
>> Class "GxSimpleList" [in ".GlobalEnv"]
>>
>> Slots:
>>
>> Name: listData elementMetadata elementType metadata
>> Class: list ANY character list
>>
>> Extends: Class "SimpleList", directly Class "Sequence", by class
>> "SimpleList", distance 2 Class "Annotated", by class "SimpleList",
>> distance 3
>>>
>>> # for the "initialize" I didn't understand how to formulate it
>>> in my case (as I don't know how many elements, neither their
>>> names) setMethod("initialize","GxSimpleList",
>>> function(.object,...) listData =
>>> listDataNew(lapply(list(.object,...) == "ExpressionSet") ))
>> Error in conformMethod(signature, mnames, fnames, f, fdef,
>> definition) : in method for ‘initialize’ with signature
>> ‘.Object="GxSimpleList"’: formal arguments (.Object =
>> "GxSimpleList", ... = "GxSimpleList") omitted in the method
>> definition cannot be in the signature
>>>
>>> setMethod("initialize","GxSimpleList", function(.object,...)
>>> {.object <- callNextMethod(.object,...)})
>> Error in conformMethod(signature, mnames, fnames, f, fdef,
>> definition) : in method for ‘initialize’ with signature
>> ‘.Object="GxSimpleList"’: formal arguments (.Object =
>> "GxSimpleList", ... = "GxSimpleList") omitted in the method
>> definition cannot be in the signature
>>>
>>> # I guess the check for experssionSets should go into validity
>>> setValidity("GxSimpleList", function(object) { # experimetal
>> + if(sum(!(unlist(lapply(object,function(x) class(x))) %in%
>> "ExpressionSet")) >0) "A 'GxSimpleList' object should contain
>> elements of class 'ExpressionSet' only !" + #same as ?#
>> assayDataValidMembers(class(object),
>> rep("ExpressionSet",length(object))) + }) Class "GxSimpleList"
>> [in ".GlobalEnv"]
>>
>> Slots:
>>
>> Name: listData elementMetadata elementType metadata
>> Class: list ANY character list
>>
>> Extends: Class "SimpleList", directly Class "Sequence", by class
>> "SimpleList", distance 2 Class "Annotated", by class "SimpleList",
>> distance 3
>>>
>>> # what happens .. lst1 = SimpleList(a=eset1, b=eset2) # OK
>>>
>>> lst2 = new("GxSimpleList",a=eset1, b=eset2) # error (due to
>>> missing "initialize" ?)
>> Error in initialize(value, ...) : invalid names for slots of class
>> "GxSimpleList": a, b
>>> lst3 = GxSimpleList(a=eset1, b=eset2) # error (due to
>>> missing "initialize" ?)
>> Error: could not find function "GxSimpleList"
>>>
>>> # for completeness ... sessionInfo()
>> R version 2.12.0 (2010-10-15) Platform: i386-pc-mingw32/i386
>> (32-bit)
>>
>> locale: [1] LC_COLLATE=French_France.1252
>> LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
>> LC_NUMERIC=C [5] LC_TIME=French_France.1252
>>
>> attached base packages: [1] grDevices datasets splines graphics
>> stats tcltk utils methods base
>>
>> other attached packages: [1] affy_1.28.0 IRanges_1.8.0
>> Biobase_2.10.0 svSocket_0.9-50 TinnR_1.0.3 R2HTML_2.2
>> Hmisc_3.8-3 survival_2.35-8
>>
>> loaded via a namespace (and not attached): [1] affyio_1.18.0
>> cluster_1.13.1 grid_2.12.0 lattice_0.19-13
>> preprocessCore_1.12.0 svMisc_0.9-60 [7] tools_2.12.0
>>>
>>
>>
>>
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et
>> Génomique Intégratives IGBMC, 1 rue Laurent Fries, 67404 Illkirch
>> Strasbourg, France Tel (+33) 388 65 3300 Fax (+33) 388 65
>> 3276 wolfgang.raffelsberger (at ) igbmc.fr
>>
>> ________________________________________ De : Martin Morgan
>> [mtmorgan at fhcrc.org] Date d'envoi : vendredi 5 novembre 2010 18:33
>> À : Wolfgang RAFFELSBERGER Cc : bioconductor at stat.math.ethz.ch
>> Objet : Re: [BioC] designing an eSet derived object
>>
>> On 11/05/2010 05:02 AM, Wolfgang RAFFELSBERGER wrote:
>>> Dear list,
>>>
>>
>>> basically I'm trying to design an object to contain the
>>> following microarray-data 1) "gxIndData": microarray-data
>>> normalized in parallel by (an array-dependent) number of n
>>> methods plus the corresponding expression-calls (again, <= n
>>> methods), 2) "gxAvData": derived values (replicate-averages,
>>> SEMs, etc), 3) gene/spot annotation, 4) sample-description, 5)
>>> various supl informations (parameters, notes, versions, etc)
>>>
>>> In overall, this is a somehow modified/extended concept to the
>>> Biobase eSet and I'm trying to figure out if there is a way to
>>> use the Biobase eSet. This way I hope to maintain a decent level
>>> of compatibility with other Bioconductor methods and allow
>>> code-reuse.
>>>
>>> Now I'd like to store the various sections of 1) and 2) as
>>> separate lists with n matrixes of values to keep things
>>> organized.
>>>
>>> According to the Vignette "Biobase development and the new eSet"
>>> section 5 ("Extending eSet"), I defined new a new class 'eSet'.
>>> But as soon as I integrate something different than matrixes at
>>> the level of 'AssayData', I get an error-message (see code below)
>>> - no matter if these are simply lists or custom-objects. I
>>> suppose this means that I would have to store all matrixes (up to
>>> 10*6methods =60 matrixes) without further organization at the
>>> level of 'AssayData'.
>>
>> eSet requires that all AssayData elements are two-dimensional with
>> identical dimensions, so a list-of-matrices would not work.
>>
>>> However, I'd like to keep at least one (in my case better 2)
>>> levels of additional arborescence to keep the data organized.
>>>
>>> So, finally I would like to integrate two new classes for 1) and
>>> 2) at the level of the assayData slot of my modified/new eSet.
>>>
>>> Does this mean this is not possible and that I cannot use the
>>> 'eSet' for my purposes ? Do I have to create a novel class
>>> somehow equivalent but finally incompatible to the 'eSet' ?
>>>
>>> Any suggestions/hints ?
>>
>> One possiblity, if this is for your own use and not as the
>> foundation for a package, is to use NChannelSet, where each method
>> is a 'channel'.
>>
>> Another possibility is to create a class that extends eSet with a
>> slot containing, e.g., an AnnotatedDataFrame with columns
>> describing the AssayData, and a method to query the slot / select
>> the appropriate assayData elements
>>
>> And perhaps what you really have is more a list of (of lists of)
>> ExpressionSets, each element of the list with additional
>> information. An approach here would use the IRanges 'SimpleList'
>> infrastructure, e.g.,
>>
>>> lst = SimpleList(a=new("ExpressionSet"), b=new("ExpressionSet"))
>>> elementMetadata(lst) = DataFrame(method=c("A", "B"))
>>> lst[elementMetadata(lst)$method == "A"]
>> SimpleList of length 1 names(1): a
>>> lst[elementMetadata(lst)$method == "A"][[1]]
>> ExpressionSet (storageMode: lockedEnvironment) assayData: 0
>> features, 0 samples element names: exprs protocolData: none
>> phenoData: none featureData: none experimentData: use
>> 'experimentData(object)' Annotation:
>>
>> Martin
>>
>>>
>>> Thank’s in advance, wolfgang
>>>
>>> ##
>>>
>>> require(Biobase) setClass("gxSet", contains = "eSet")
>>> setMethod("initialize", "gxSet", function(.Object,
>>> A=new("list"),B=new("list"),...) { callNextMethod(.Object,
>>> A=A,B=B, ...) }) new("gxSet") ## produces : Error in function
>>> (storage.mode = c("lockedEnvironment", "environment", :
>>> 'AssayData' elements with invalid dimensions: 'A' 'B'
>>>
>>>
>>> ## ideally I'd like to use
>>> setClass("gxIndData",representation(SIdata="list",SIcall="list"))
>>>
>>>
setClass("gxAvData",representation(avSI="list",expressed="list",SEM="list",
>>> conCall="list",
>>> FC="list",FiltFin="list",FiltSI="list",FiltOther="list"))
>>> setClass("gxSet", contains = "eSet")
>>>
>>> setMethod("initialize","gxSet", function(.Object,
>>> assayData=assayDataNew(IndData=IndData,AvData=AvData),
>>> IndData=new("gxIndData"), AvData=new("gxAvData"),...) {
>>> if(!missing(assayData) && any(!missing(IndData),
>>> !missing(AvData))) { warning("using 'assayData'; ignoring
>>> 'IndData', 'AvData'") } callNextMethod(.Object, assayData =
>>> assayData, ...) })
>>>
>>> new("gxSet") ## produces : Error in assayDataNew(IndData =
>>> IndData, AvData = AvData) : 'AssayData' elements with invalid
>>> dimensions: 'AvData' 'IndData'
>>>
>>>
>>> ## the alternative : an eSet 'like' but independent and
>>> incompatible object ..
>>> setClass("gxSet",representation(IndData="gxIndData",AvData="gxAvData",phenoData="AnnotatedDataFrame",featureData="AnnotatedDataFrame",
>>>
>>>
>
>>>
experimentData="MIAME",annotation="character",protocolData="AnnotatedDataFrame",notes="list"))
>>>
>>>
>>>
>>> ## for completeness: sessionInfo() R version 2.12.0 (2010-10-15)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> locale: [1] LC_COLLATE=French_France.1252
>>> LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
>>> [4] LC_NUMERIC=C LC_TIME=French_France.1252
>>>
>>> attached base packages: [1] grDevices datasets splines
>>> graphics stats tcltk utils methods base
>>>
>>> other attached packages: [1] affy_1.28.0 Biobase_2.10.0
>>> svSocket_0.9-50 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3
>>> survival_2.35-8
>>>
>>> loaded via a namespace (and not attached): [1] affyio_1.18.0
>>> cluster_1.13.1 grid_2.12.0 lattice_0.19-13
>>> preprocessCore_1.12.0 [6] svMisc_0.9-60 tools_2.12.0
>>>
>>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>>> . . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique
>>> et Génomique Intégratives IGBMC, 1 rue Laurent Fries, 67404
>>> Illkirch Strasbourg, France Tel (+33) 388 65 3300 Fax
>>> (+33) 388 65 3276 wolfgang.raffelsberger @ igbmc.fr
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>>
>>>
>>> _______________________________________________ Bioconductor
>>> mailing list Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>> archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>>
-- Computational Biology Fred Hutchinson Cancer Research Center 1100
>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861 Telephone: 206 667-2793
>
>
> -- Computational Biology Fred Hutchinson Cancer Research Center 1100
> Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861 Telephone: 206 667-2793
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list