[Bioc-devel] Changes to the SummarizedExperiment Class

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Mon Mar 9 15:36:13 CET 2015


On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

> I am glad you are keeping this discussion alive Kasper.
>
> On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen <
> kasperdanielhansen at gmail.com> wrote:
>
>> It sounds like the proposed changes are already made.  However (like
>> others) I am still a bit mystified why this was necessary.  The old
>> version
>> did allow for a GRanges inside the DataFrame of the rowData, as far as I
>> recall.  So I assume this is for efficiency.  But why?  What kind of
>> data/use cases is this for?
>>
>> I am happy to hear that SummarizedExperiment is going to be spun out into
>> its own package.  When that happens, I have some comments, which I'll
>> include here in anticipation
>>   1) I now very strongly believe it was a design mistake to not have
>> colnames on the assays.  The advantage of this choice is that sampleNames
>> are only stored one place.  The extreme disadvantage is the high
>> ineffeciency when you want colnames on an extracted assay.
>>
>
> after example(SummarizedExperiment)
>
> > colnames(assays(se1)[[1]])
> [1] "A" "B" "C" "D" "E" "F"
>
> so this seems to be optional.  But attempts to set rownames will fail
> silently
>
> > rownames(assays(se1)[[1]]) = as.character(1:200)
>
> > rownames(assays(se1)[[1]])
>
> NULL
> seems we could issue a warning there
>


Vince, you need to be careful here.

The assays are stored without colnames (unless something has recently
changed).  The default is to - upon extraction - set the colnames of the
matrix.  This however requires a copy of the entire matrix.  So
essentially, upon extraction, each assay is needlessly duplicated to add
the colnames.  This is what I mean by inefficient. I would prefer to store
the assays with colnames.  This means that changing sampleNames of the
object will be inefficient (as it is for eSets) since it would require a
complete copy of everything.  But I would rather - much rather - copy when
setting sampleNames than copy when extracting an assay.

Best,
Kasper

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list