[Bioc-devel] coerce ExpressionSet to SummarizedExperiment
Hervé Pagès
hpages at fredhutch.org
Thu Sep 14 03:19:47 CEST 2017
One more thing. See below...
On 09/13/2017 02:54 PM, Ludwig Geistlinger wrote:
> Coercing vice versa, i.e. from SummarizedExperiment to ExpressionSet,
> which is defined in
>
> SummarizedExperiment/R/makeSummarizedExperimentFromExpressionSet.R
>
> as follows:
>
> setAs("SummarizedExperiment", "ExpressionSet", function(from)
> as(as(from, "RangedSummarizedExperiment"), "ExpressionSet")
> )
>
> also seems to be a bit problematic, as it makes you lose your rowData/fData.
>
>
>
> Here is an example:
>
> ## Constructing the SE similar to examples of ?SummarizedExperiment
>> nrows <- 200; ncols <- 6
>> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
>> colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
> row.names=LETTERS[1:6])
>
>
> ## some rowData with simulated gene IDs
>> rowData <- DataFrame(EntrezID=sample(1000, 200), row.names=paste0("g",
> 1:200))
>> se <- SummarizedExperiment(assays=SimpleList(exprs=counts),
> colData=colData, rowData=rowData)
>
> # this is how it looks
>> rowData(se)
> DataFrame with 200 rows and 1 column
> EntrezID
> <integer>
> 1 289
> 2 476
> 3 608
> 4 998
> 5 684
> ... ...
> 196 331
> 197 590
> 198 445
> 199 95
> 200 129
>
> (why did I actually lost the rownames g1-g200 here?)
Your rownames were moved to the names of the object:
> head(names(se))
[1] "g1" "g2" "g3" "g4" "g5" "g6"
The rowData() accessor (like the mcols() accessor, note that rowData()
is just an alias for mcols) does not restore them by default, unless
you use 'use.names=TRUE'.
> rowData(se, use.names=TRUE)
DataFrame with 200 rows and 1 column
EntrezID
<integer>
g1 616
g2 45
g3 944
g4 632
g5 270
... ...
g196 827
g197 943
g198 291
g199 432
g200 106
All Vector derivatives do that (e.g. GRanges), not just
SummarizedExperiment.
The reason for this design is that the rownames must be unique
(this is a base R requirement). By moving them from the DataFrame
containing the metadata columns to the names of the object, Vector
derivatives can be subsetted in a way that repeat some of their
elements. If the rownames were on the DataFrame containing the
metadata columns, these subsetting operations wouldn't be
possible.
Hope this makes sense,
H.
>
>
> ## Coercing to Expression makes me losing the rowData/fData
>> eset <- as(se, "ExpressionSet")
>> fData(eset)
> data frame with 0 columns and 200 rows
>
>
> ## So where is the problem?
> ## Apparently in the coercion
> ## from SummarizedExperiment to RangedSummarizedExperiment
>> rse <- as(se, "RangedSummarizedExperiment")
>> rowData(rse)
> DataFrame with 200 rows and 0 columns
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list