[Bioc-devel] coerce ExpressionSet to SummarizedExperiment

Hervé Pagès hpages at fredhutch.org
Thu Sep 14 03:19:47 CEST 2017


One more thing. See below...

On 09/13/2017 02:54 PM, Ludwig Geistlinger wrote:
> Coercing vice versa, i.e. from SummarizedExperiment to ExpressionSet,
> which is defined in
>
> SummarizedExperiment/R/makeSummarizedExperimentFromExpressionSet.R
>
> as follows:
>
> setAs("SummarizedExperiment", "ExpressionSet", function(from)
>      as(as(from, "RangedSummarizedExperiment"), "ExpressionSet")
> )
>
> also seems to be a bit problematic, as it makes you lose your rowData/fData.
>
>
>
> Here is an example:
>
> ## Constructing the SE similar to examples of ?SummarizedExperiment
>> nrows <- 200; ncols <- 6
>> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
>> colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
>                            row.names=LETTERS[1:6])
>
>
> ## some rowData with simulated gene IDs
>> rowData <- DataFrame(EntrezID=sample(1000, 200), row.names=paste0("g",
> 1:200))
>> se <- SummarizedExperiment(assays=SimpleList(exprs=counts),
>                              colData=colData, rowData=rowData)
>
> # this is how it looks
>> rowData(se)
> DataFrame with 200 rows and 1 column
>       EntrezID
>      <integer>
> 1         289
> 2         476
> 3         608
> 4         998
> 5         684
> ...       ...
> 196       331
> 197       590
> 198       445
> 199        95
> 200       129
>
> (why did I actually lost the rownames g1-g200 here?)

Your rownames were moved to the names of the object:

 > head(names(se))
[1] "g1" "g2" "g3" "g4" "g5" "g6"

The rowData() accessor (like the mcols() accessor, note that rowData()
is just an alias for mcols) does not restore them by default, unless
you use 'use.names=TRUE'.

 > rowData(se, use.names=TRUE)
DataFrame with 200 rows and 1 column
       EntrezID
      <integer>
g1         616
g2          45
g3         944
g4         632
g5         270
...        ...
g196       827
g197       943
g198       291
g199       432
g200       106

All Vector derivatives do that (e.g. GRanges), not just
SummarizedExperiment.

The reason for this design is that the rownames must be unique
(this is a base R requirement). By moving them from the DataFrame
containing the metadata columns to the names of the object, Vector
derivatives can be subsetted in a way that repeat some of their
elements. If the rownames were on the DataFrame containing the
metadata columns, these subsetting operations wouldn't be
possible.

Hope this makes sense,
H.

>
>
> ## Coercing to Expression makes me losing the rowData/fData
>> eset <- as(se, "ExpressionSet")
>> fData(eset)
> data frame with 0 columns and 200 rows
>
>
> ## So where is the problem?
> ## Apparently in the coercion
> ##    from SummarizedExperiment to RangedSummarizedExperiment
>> rse <- as(se, "RangedSummarizedExperiment")
>> rowData(rse)
> DataFrame with 200 rows and 0 columns
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list