[Bioc-devel] Changes to the SummarizedExperiment Class
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Mon Mar 9 15:36:13 CET 2015
On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:
> I am glad you are keeping this discussion alive Kasper.
>
> On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen <
> kasperdanielhansen at gmail.com> wrote:
>
>> It sounds like the proposed changes are already made. However (like
>> others) I am still a bit mystified why this was necessary. The old
>> version
>> did allow for a GRanges inside the DataFrame of the rowData, as far as I
>> recall. So I assume this is for efficiency. But why? What kind of
>> data/use cases is this for?
>>
>> I am happy to hear that SummarizedExperiment is going to be spun out into
>> its own package. When that happens, I have some comments, which I'll
>> include here in anticipation
>> 1) I now very strongly believe it was a design mistake to not have
>> colnames on the assays. The advantage of this choice is that sampleNames
>> are only stored one place. The extreme disadvantage is the high
>> ineffeciency when you want colnames on an extracted assay.
>>
>
> after example(SummarizedExperiment)
>
> > colnames(assays(se1)[[1]])
> [1] "A" "B" "C" "D" "E" "F"
>
> so this seems to be optional. But attempts to set rownames will fail
> silently
>
> > rownames(assays(se1)[[1]]) = as.character(1:200)
>
> > rownames(assays(se1)[[1]])
>
> NULL
> seems we could issue a warning there
>
Vince, you need to be careful here.
The assays are stored without colnames (unless something has recently
changed). The default is to - upon extraction - set the colnames of the
matrix. This however requires a copy of the entire matrix. So
essentially, upon extraction, each assay is needlessly duplicated to add
the colnames. This is what I mean by inefficient. I would prefer to store
the assays with colnames. This means that changing sampleNames of the
object will be inefficient (as it is for eSets) since it would require a
complete copy of everything. But I would rather - much rather - copy when
setting sampleNames than copy when extracting an assay.
Best,
Kasper
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list