[Bioc-devel] dimnames of multidimensional assays in SummarizedExperiment

Hervé Pagès hpages at fredhutch.org
Wed Feb 24 22:25:47 CET 2016


Hi Pete,

Sorry for the delay.

On 02/10/2016 12:33 PM, Peter Hickey wrote:
> The assays slot in a SummarizedExperiment object supports elements
> with up to 4 dimensions [*]
>
> library(SummarizedExperiment)
> makeSE <- function(n) {
>    assay <- array(1:2^n,

What? `^` has precedence over `:` ? Amazing...

>                   dim = rep(2, n),
>                   dimnames = split(letters[1:(2 * n)], seq_len(n)))
>    SummarizedExperiment(assay)
> }
> x <- makeSE(4)
>
> However, the "higher-order" dimnames of the assays aren't preserved
> when calling the `assays` or `assay` getters:
>
>> dimnames(assay(x, withDimnames = TRUE))
> [[1]]
> [1] "a" "e"
>
> [[2]]
> [1] "b" "f"
>
> [[3]]
> NULL
>
> [[4]]
> NULL
>
> This is despite the data still being available in the assays slot:
>
>> dimnames(x at assays[[1]])
> 1`
> [1] "a" "e"
>
> 2`
> [1] "b" "f"
>
> 3`
> [1] "c" "g"
>
> 4`
> [1] "d" "h"
>
> The following patch fixes this by only touching the rownames and
> colnames and not touching the "higher-order" dimnames. Seem
> reasonable?
>
> Index: R/SummarizedExperiment-class.R
> ===================================================================
>    --- R/SummarizedExperiment-class.R (revision 113505)
> +++ R/SummarizedExperiment-class.R (working copy)
> @@ -174,7 +174,10 @@
> {
>    assays <- as(x at assays, "SimpleList")
>    if (withDimnames)
>      -        endoapply(assays, "dimnames<-", dimnames(x))
>    + endoapply(assays, function(assay) {
>      +                    dimnames(assay)[1:2] <- dimnames(x)
>      +                    assay
>      +                })
>    else
>      assays
> })

Thanks for catching this and providing a patch. I just applied it (in
SummarizedExperiment 1.1.21) and added some tests to cover this,

>
> [*] In fact, the assay elements can have more than 4 dimensions when
> constructed, although subsetting with `[` isn't supported (possibly
> things other than subsetting break as well in this case).
>
> # No error
> y <- makeSE(5)
> y
>
> # Error
> y[1, ]
>
> Perhaps there should be a check in the constructor that all assay
> elements have < 5 dimensions?

The early checking/validation of an SE object is an interesting topic
that is open for discussion. This is something we've discussed
internally with Martin and it seems that there is some benefit in
not enforcing the full SummarizedExperiment API contract upfront.
For example, you could imagine that someone wants to stick a 5-D
assay in an SE object but doesn't have the need to subset the object.
Or that someone wants to stick a 2-D assay that doesn't even
support subsetting, or doesn't support cbind() or rbind() (which means
that then trying to subset the SE object or cbind() or rbind() it
with another SE object will fail).

It actually seems to be a good feature that people can stick almost
anything in the assays slot of an SE object, as long as the individual
assay objects support dim(), dimnames(), and dimnames<-. These are the
minimum requirements and they give you an SE object with minimal
capabilities. The full requirements (i.e. the above plus [, [<-, rbind,
cbind) give you an SE object with full capabilities.

On-disk objects (e.g. in HDF5 format) are a good example of objects
that won't give you SE objects with full capabilities but enough
capabilities for some common workflows. FWIW I'm currently working on
an implementation of HDF5Matrix and HDF5Array objects that will support
dim(), dimnames(), dimnames<-, and [. They won't support [<-, rbind,
and cbind but these capabilities are not needed by popular workflows
like the DESeq2 vignette.

That being said, I don't know the reason for the current 4 dimensions
limit of subsetting. Sounds kind of arbitrary. Maybe we should just
support subsetting of assays with any number dimensions.

Cheers,
H.

>
> Cheers,
> Pete
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list