[Bioc-devel] do SummarizedExperiments really need colnames?
alun at wehi.edu.au
Sat Dec 5 21:50:26 CET 2015
Option (2) sounds nice. Even if you used (3), I would end up naming most
of my columns by seq_len anyway. I usually don't have very good
candidates for column names for my counting functions; the inputs to the
functions are BAM/other file paths that can get quite long, and I don't
like having spaced-out columns when I look at my assays.
I concede that the example of se[,2] <- se[,1] wasn't the most
realistic; it came about from some unit tests I was using to check
subset replacement behaviour, and it was failing once I threw in column
On 05/12/15 18:35, Morgan, Martin wrote:
> The philosophy motivating the check is that names make the relationship between samples and data explicit, rather than relying on fragile positional information. With this in mind, I wonder why your upstream work flow does not include dimnames on the matrix?
> That said, the check was introduced in
> r68053 | mtmorgan at fhcrc.org | 2012-07-27 03:35:55 -0400 (Fri, 27 Jul 2012) | 2 lines
> SummarizedExperiment uses rowData=GRangesList() as defult
> To the observations you mention below one could also add that the rownames() can be NULL, so there is an uncomfortable asymmetry.
> I could (1) remove the check (but use the DataFrame() constructor in an admittedly hackish way, not wanting to rely on the internal new() function). I could also (2) construct row / column names as seq_len(nrow()) / seq_len(ncol()).
> Or (3) the code could be tightened to more closely adhere to the philosophy above (for instance, I think duplication of columns implied by se[,2] = se[,1] is worth stop()ing over, and allowing colnames(se) = NULL only enables bad practice). Likely this would be disruptive.
> For what it's worth, we have
>> eset = ExpressionSet(matrix(0, 1, 2))
>  "1"
>  "1" "2"
>> colnames(eset) = NULL
> Error in `sampleNames<-`(`*tmp*`, value = NULL) :
> 'value' length (0) must equal sample number in AssayData (2)
> so dimnames are being imposed.
> (2) would be my current compromise preference.
> From: Bioc-devel [bioc-devel-bounces at r-project.org] on behalf of Aaron Lun [alun at wehi.edu.au]
> Sent: Saturday, December 05, 2015 7:36 AM
> To: bioc-devel
> Subject: Re: [Bioc-devel] do SummarizedExperiments really need colnames?
> Hello all,
> At the start of the SummarizedExperiment constructor, there's a code
> block that throws an error if 'colData' is not specified and the assay
> matrices don't have column names.
> Is this really necessary? In many cases, I just want to get a matrix
> into the SE0 object without having to worry about column names. It
> doesn't seem like there's a requirement for this in the SE0 class,
> either; it seems happy with 'colnames(se0) <- NULL', and setting
> 'colData' to a 'DataFrame' with 'NULL' row names doesn't break the
> The requirement for column names causes issues for some manipulations -
> for example:
> out <- SummarizedExperiment(matrix(0, 10, 5),
> out[,1] <- out[,2]
> ## Error in `rownames<-`(`*tmp*`, value = c("2", "2", "3", "4", "5")) :
> ## duplicate rownames not allowed
> While this is fair enough, it's a bit annoying if I didn't want or need
> the names in the first place.
> The error mentioned above precedes the construction of the missing
> 'colData', so if column names are missing, then a more general way to
> construct the 'colData' would to do 'new("DataFrame", nrows=ncol(assays))'.
> Bioc-devel at r-project.org mailing list
> This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
More information about the Bioc-devel