[Bioc-devel] do SummarizedExperiments really need colnames?

Sat Dec 5 19:35:35 CET 2015

The philosophy motivating the check is that names make the relationship between samples and data explicit, rather than relying on fragile positional information. With this in mind, I wonder why your upstream work flow does not include dimnames on the matrix?

That said, the check was introduced in

------------------------------------------------------------------------
r68053 | mtmorgan at fhcrc.org | 2012-07-27 03:35:55 -0400 (Fri, 27 Jul 2012) | 2 lines

SummarizedExperiment uses rowData=GRangesList() as defult

------------------------------------------------------------------------

To the observations you mention below one could also add that the rownames() can be NULL, so there is an uncomfortable asymmetry.

I could (1) remove the check (but use the DataFrame() constructor in an admittedly hackish way, not wanting to rely on the internal new() function). I could also (2) construct row / column names as seq_len(nrow()) / seq_len(ncol()).

Or (3) the code could be tightened to more closely adhere to the philosophy above (for instance, I think duplication of columns implied by se[,2] = se[,1] is worth stop()ing over, and allowing colnames(se) = NULL only enables bad practice). Likely this would be disruptive.

For what it's worth, we have

> library(Biobase)
> eset = ExpressionSet(matrix(0, 1, 2))
> dimnames(eset)
[[1]]
[1] "1"

[[2]]
[1] "1" "2"
> colnames(eset) = NULL
Error in `sampleNames<-`(`*tmp*`, value = NULL) : 
  'value' length (0) must equal sample number in AssayData (2)

so dimnames are being imposed.

(2) would be my current compromise preference.

Martin
________________________________________
From: Bioc-devel [bioc-devel-bounces at r-project.org] on behalf of Aaron Lun [alun at wehi.edu.au]
Sent: Saturday, December 05, 2015 7:36 AM
To: bioc-devel
Subject: Re: [Bioc-devel] do SummarizedExperiments really need colnames?

Hello all,

At the start of the SummarizedExperiment constructor, there's a code
block that throws an error if 'colData' is not specified and the assay
matrices don't have column names.

Is this really necessary? In many cases, I just want to get a matrix
into the SE0 object without having to worry about column names. It
doesn't seem like there's a requirement for this in the SE0 class,
either; it seems happy with 'colnames(se0) <- NULL', and setting
'colData' to a 'DataFrame' with 'NULL' row names doesn't break the
constructor.

The requirement for column names causes issues for some manipulations -
for example:

out <- SummarizedExperiment(matrix(0, 10, 5),
colData=DataFrame(row.names=1:5))
out[,1] <- out[,2]

## Error in `rownames<-`(`*tmp*`, value = c("2", "2", "3", "4", "5")) :
##  duplicate rownames not allowed

While this is fair enough, it's a bit annoying if I didn't want or need
the names in the first place.

The error mentioned above precedes the construction of the missing
'colData', so if column names are missing, then a more general way to
construct the 'colData' would to do 'new("DataFrame", nrows=ncol(assays))'.

Cheers,

Aaron

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.