[Bioc-devel] coerce ExpressionSet to SummarizedExperiment

Mon Sep 11 17:56:22 CEST 2017

I guess we discussed this with Davide Risso @Bioc2017 in the
MultiAssayExperiment workshop.

> SummarizedExperiment(mouseData)

puts the eSet (rather counterintuitively) into `assays` of
`SummarizedExperiment`, it does not really coerce it to
SummarizedExperiment, eg. `fData` and `pData` are not accordingly
transferred to colData and rowData.

While I can understand that this is by design of `SummarizedExperiment`, I
really wonder whether there are use cases where somebody would like to put
an `ExpressionSet` in `assays` of `SummarizedExperiment`, and not rather
would like to coerce it that way.

Furthermore, if you would indeed like to have several `ExpressionSet`s in
a `SummarizedExperiment`, haven't you already arrived at a scenario where
use of `MultiAssayExperiment` is indicated?

> Thanks Martin! I see the RangedSummarizedExperiment coercion method works
> when there are no mappable ranges (for example curatedMetagenomicData
> ExpressionSet objects), although the rowRanges is a GRangesList of empty
> elements. It might be worth also having a SummarizedExperiment coercion
> method it it's not a problematic or big job. And now I suppose I can ask
> the question I *really* wanted to know, which is why can't I coerce an
> object that extends eSet? I can still use the SummarizedExperiment()
> constructor, but for example:
>
>> library(metagenomeSeq)> data(mouseData)> class(mouseData)[1]
>> "MRexperiment"
> attr(,"package")
> [1] "metagenomeSeq"> is(mouseData, "ExpressionSet")[1] FALSE>
> is(mouseData, "eSet")[1] TRUE
>
>> SummarizedExperiment(mouseData)class: SummarizedExperiment
> dim: 10172 139
> metadata(0):
> assays(1): ''
> rownames(10172): Prevotellaceae:1 Lachnospiraceae:1 ... Bryantella:103
>   Parabacteroides:956
> rowData names(0):
> colnames(139): PM1:20080107 PM1:20080108 ... PM9:20080225 PM9:20080303
> colData names(0):
>
>> as(mouseData, "RangedSummarizedExperiment") Error in as(mouseData,
> "RangedSummarizedExperiment") : no method or default for coercing
> â€œMRexperimentâ€ to â€œRangedSummarizedExperimentâ€ > as(mouseData,
> "SummarizedExperiment") Error in as(mouseData, "SummarizedExperiment") :
> no
> method or default for coercing â€œMRexperimentâ€ to
> â€œSummarizedExperimentâ€ > as(mouseData,
> "ExpressionSet") Error in updateOldESet(from, "ExpressionSet") : no slot
> of
> name "pData" for this object of class "AnnotatedDataFrame" >
>
>
>
>
> On Mon, Sep 11, 2017 at 6:58 AM, Martin Morgan <
> martin.morgan at roswellpark.org> wrote:
>
>> On 09/10/2017 08:38 PM, Levi Waldron wrote:
>>
>>> I just dug up this old thread because I realized we still don't have a
>>> coercion method as(sample.ExpressionSet, "SummarizedExperiment"). Since
>>> we
>>>
>>
>> try as(sample.ExpressionSet, "RangedSummarizedExperiment"); see
>> ?makeSummarizedExperimentFromExpressionSet
>>
>> do have SummarizedExperiment(sample.ExpressionSet), could the coercion
>>> method also be added easily?
>>>
>>> library(Biobase) > library(SummarizedExperiment) >
>>>>
>>> example("ExpressionSet")
>>>
>>> SummarizedExperiment(sample.ExpressionSet)class: SummarizedExperiment
>>>>
>>> dim: 500 26
>>> metadata(0):
>>> assays(1): ''
>>> rownames(500): AFFX-MurIL2_at AFFX-MurIL10_at ... 31738_at 31739_at
>>> rowData names(0):
>>> colnames(26): A B ... Y Z
>>> colData names(0):> as(sample.ExpressionSet,
>>> "SummarizedExperiment")Error in as(sample.ExpressionSet,
>>> "SummarizedExperiment") :
>>>    no method or default for coercing â€œExpressionSetâ€ to
>>> â€œSummarizedExperimentâ€
>>>
>>> sessionInfo()R version 3.4.0 RC (2017-04-20 r72569)
>>>>
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Ubuntu 16.04.3 LTS
>>>
>>> Matrix products: default
>>> BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
>>> LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>> LC_TIME=en_US.UTF-8
>>>   [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
>>> LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>> LC_ADDRESS=C
>>> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8
>>> LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats4    parallel  stats     graphics  grDevices utils
>>> datasets  methods   base
>>>
>>> other attached packages:
>>> [1] SummarizedExperiment_1.7.5 DelayedArray_0.3.16
>>> matrixStats_0.52.2
>>> [4] GenomicRanges_1.29.6       GenomeInfoDb_1.13.4
>>> IRanges_2.11.7
>>> [7] S4Vectors_0.15.5           Biobase_2.37.2
>>> BiocGenerics_0.23.0
>>>
>>> loaded via a namespace (and not attached):
>>>   [1] lattice_0.20-35         bitops_1.0-6            grid_3.4.0
>>>   [4] zlibbioc_1.23.0         XVector_0.17.0          Matrix_1.2-11
>>>   [7] tools_3.4.0             RCurl_1.95-4.8          compiler_3.4.0
>>> [10] GenomeInfoDbData_0.99.1
>>>
>>>
>>>>
>>> On Mon, Sep 22, 2014 at 1:54 AM, HervÃ© PagÃ¨s <hpages at fhcrc.org>
>>> wrote:
>>>
>>> Hi,
>>>>
>>>> On 09/20/2014 11:14 AM, Martin Morgan wrote:
>>>>
>>>> On 09/20/2014 10:43 AM, Sean Davis wrote:
>>>>>
>>>>> Hi, Vince.
>>>>>>
>>>>>> Looks like a good start.  I'd probably pull all the assays from
>>>>>> ExpressionSet into SummarizedExperiment as the default, avoiding
>>>>>> data
>>>>>> coercion methods that are unnecessarily lossy.  Also, as it stands,
>>>>>> the
>>>>>> assayname argument is not used anyway?
>>>>>>
>>>>>>
>>>>> I think there will be some resistance to uniting the 'Biobase' and
>>>>> 'IRanges' realms under 'GenomicRanges';
>>>>>
>>>>>
>>>> This coercion method could be defined (1) in Biobase (where
>>>> ExpressionSet is defined), (2) in GenomicRanges (where
>>>> SummarizedExperiment is defined), or (3) in a package that
>>>> depends on Biobase and GenomicRanges.
>>>>
>>>> Since it's probably undesirable to make Biobase depend on
>>>> GenomicRanges
>>>> or vice-versa, we would need to use Suggests for (1) or (2). That
>>>> means we would get a note like this at installation time:
>>>>
>>>>   ** preparing package for lazy loading
>>>>   in method for â€˜coerceâ€™ with signature
>>>> â€˜"ExpressionSet","SummarizedEx
>>>> periment"â€™:
>>>>   no definition for class â€œSummarizedExperimentâ€
>>>>
>>>> Not very clean but it works.
>>>>
>>>> (3) is a cleaner solution but then the coercion method would
>>>> not necessarily be available to the user when s/he needs it (unless
>>>> s/he knows what extra package to load). The obvious advantage of
>>>> putting the method in Biobase is that if a user has an ExpressionSet,
>>>> then s/he necessarily has Biobase attached and the method is already
>>>> in her/his search path.
>>>>
>>>> Another solution would be (4) to move SummarizedExperiment somewhere
>>>> else. That would be in a package that depends on GenomicRanges and
>>>> Biobase, and the coercion method would be defined there.
>>>>
>>>> H.
>>>>
>>>>
>>>> considerable effort has gone in
>>>>
>>>>> to making a rational hierarchy of package dependencies [perhaps Herve
>>>>> will point to some of his ASCII art on the subject].
>>>>>
>>>>> I have some recollection of (recent) discussion related to this topic
>>>>> in
>>>>> the DESeq2 realm, but am drawing a blank; presumably Michael or
>>>>> Wolfgang
>>>>> or ... will chime in.
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>> Sean
>>>>>>
>>>>>>
>>>>>> On Sat, Sep 20, 2014 at 10:38 AM, Vincent Carey
>>>>>> <stvjc at channing.harvard.edu>
>>>>>> wrote:
>>>>>>
>>>>>> do we have a facility for this?
>>>>>>
>>>>>>>
>>>>>>> if not, we have
>>>>>>>
>>>>>>> https://github.com/vjcitn/biocMultiAssay/blob/master/R/exs2se.R
>>>>>>>
>>>>>>> https://github.com/vjcitn/biocMultiAssay/blob/master/man/coe
>>>>>>> rce-methods.Rd
>>>>>>>
>>>>>>>
>>>>>>> it occurred to me that we might want something like this in
>>>>>>> GenomicRanges
>>>>>>> (that's where SummarizedExperiment is managed, right?) and I will
>>>>>>> add
>>>>>>> it
>>>>>>> if there are no objections
>>>>>>>
>>>>>>> the arguments are currently
>>>>>>>
>>>>>>>        assayname = "exprs",    # for naming SimpleList element
>>>>>>>        fngetter =
>>>>>>>              function(z) rownames(exprs(z)),   # extract usable
>>>>>>> feature names
>>>>>>>        annDbGetter =
>>>>>>>             function(z) {
>>>>>>>                 clnanno = sub(".db", "", annotation(z))
>>>>>>>                 stopifnot(require(paste0(annotation(z), ".db"),
>>>>>>> character.only=TRUE) )
>>>>>>>                 get(paste0(annotation(z), ".db"))  # obtain
>>>>>>> resource
>>>>>>> for
>>>>>>> mapping feature names to coordinates
>>>>>>>                 },
>>>>>>>        probekeytype = "PROBEID",   # chipDb field to use
>>>>>>>        duphandler = function(z) {    # action to take to process
>>>>>>> duplicated
>>>>>>> features
>>>>>>>             if (any(isd <- duplicated(z[,"PROBEID"])))
>>>>>>>                 return(z[!isd,,drop=FALSE])
>>>>>>>             z
>>>>>>>             },
>>>>>>>        signIsStrand = TRUE,   # verify that signs of addresses
>>>>>>> define
>>>>>>> strand
>>>>>>>        ucsdChrnames = TRUE    # prefix 'chr' to chromosome token
>>>>>>>
>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>
>>>>>>>
>>>>>>>      [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>> HervÃ© PagÃ¨s
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages at fhcrc.org
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>
>>>
>>>
>>
>> This email message may contain legally privileged and/or confidential
>> information.  If you are not the intended recipient(s), or the employee
>> or
>> agent responsible for the delivery of this message to the intended
>> recipient(s), you are hereby notified that any disclosure, copying,
>> distribution, or use of this email message is prohibited.  If you have
>> received this message in error, please notify the sender immediately by
>> e-mail and delete this email message from your computer. Thank you.
>>
>
>
>
> --
> Levi Waldron
> http://www.waldronlab.org
> Assistant Professor of Biostatistics     CUNY School of Public Health
> US: +1 646-364-9616                                           Skype:
> levi.waldron
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Dr. Ludwig Geistlinger
eMail: Ludwig.Geistlinger at bio.ifi.lmu.de