[Bioc-devel] Behaviour of rbind/cbind on assays slot of SummarizedExperiment with multidimensional assays

Hervé Pagès hpages at fredhutch.org
Thu Mar 3 19:56:05 CET 2016


On 03/03/2016 10:49 AM, Peter Hickey wrote:
> Hi Herve,
>
> I agree, the abind::abind() signature is rather verbose and much of it is not
> required in the context of a SummarizedExperiment. Perhaps "overriding"
> abind::abind() with an S4 generic with a different signature isn't a good idea
> and it would be better to have our own generic.
>
> I quite like arbind() and acbind() as names. I guess these would live in the
> SummarizedExperiment package?

Yes.

>
> Happy to do further work on this but I won't have time until the weekend or
> next week.

Thanks for offering. No rush.

H.

>
> Cheers,
> Pete
>
> On Thu, 3 Mar 2016 at 13:31 Hervé Pagès <hpages at fredhutch.org> wrote:
>>
>> Hi Pete,
>>
>> On 03/02/2016 12:42 PM, Peter Hickey wrote:
>>> This is mostly directed to Herve and/or Martin, but I'd be interested
>>> in other's input too.
>>>
>>> The SummarizedExperiment package defines rbind,Assays-method and
>>> cbind,Assays-method that are called when rbind() or cbind() is called
>>> on a SummarizedExperiment object. In the case of two-dimensional assay
>>> (matrix) these work much as if rbind/cbind were called on the matrix:
>>>
>>>> library(SummarizedExperiment)
>>>> m <- matrix(rnorm(100), nrow = 4, ncol = 25)
>>>> se1 <- SummarizedExperiment(m)
>>>> dim(assay(rbind(se1, se1)))
>>> [1]  8 25
>>>> dim(rbind(assay(se1), assay(se1)))
>>> [1]  8 25
>>>> dim(assay(cbind(se1, se1)))
>>> [1]  4 50
>>>> dim(cbind(assay(se1), assay(se1)))
>>> [1]  4 50
>>>
>>> When an assay is an array with more than 2 dimensions, however, the
>>> result of the rbind,Assay-method (resp. cbind,Assays-method) differs
>>> from the rbind,array-method (resp. cbind,array-method). This is for a
>>> good reason because it preserves the dimensionality of the assay in
>>> the SummarizedExperiment object. So in fact the "rbind(...)" of the
>>> assay is more like abind::abind(..., along = 1) and the "cbind(...)"
>>> of the assay is more like abind::abind(..., along = 2):
>>>
>>>> x <- array(rnorm(100), dim = c(4, 5, 5))
>>>> se2 <- SummarizedExperiment(x)
>>>> dim(assay(rbind(se2, se2)))
>>> [1] 8 5 5
>>>> dim(rbind(assay(se2), assay(se2)))
>>> [1]   2 100
>>>> dim(abind::abind(assay(se2), assay(se2), along = 1))
>>> [1] 8 5 5
>>>> identical(assay(rbind(se2, se2)), abind::abind(assay(se2), assay(se2), along = 1))
>>> [1] TRUE
>>>> dim(assay(cbind(se2, se2)))
>>> [1]  4 10  5
>>>> dim(cbind(assay(se2), assay(se2)))
>>> [1] 100   2
>>>> dim(abind::abind(assay(se2), assay(se2), along = 2))
>>> [1]  4 10  5
>>>> identical(assay(cbind(se2, se2)), abind::abind(assay(se2), assay(se2), along = 2))
>>> [1] TRUE
>>>
>>> rbind/cbind does not work for other "array-like" objects with > 2
>>> dimensions in the assays slot of a SummarizedExperiment because the
>>> internal function SummarizedExperiment:::.bind_assay_elements()
>>> constructs a new array via array() if the assay has more than 2
>>> dimensions, thus destroying the original class of the array-like
>>> object.
>>>
>>> What I'm wondering is whether there is a way to generalise rbind/cbind
>>> of Assays to other array-like objects provided that have a suitable
>>> method defined. It seems to me that a good candidate would be to
>>> require that an object in the assays slot has an abind(..., along = 1)
>>> and abind(..., along = 2) method defined if it has more than 2
>>> dimensions. It might even be worth using abind::abind() for when the
>>> assay is an array with more than 2 dimensions to simplify the code
>>> somewhat.
>>>
>>> Thoughts? I'd be happy to work on a patch.
>>
>> Requiring that abind(..., along=1) and abind(..., along=2) work on
>> assays of dim > 2 would work. Note that abind() has a complicated
>> signature (many extra arguments) but the "abind" methods that one
>> would need to implement wouldn't need to satisfy the full abind()
>> contract (in the context of SummarizedExperiment assays, satisfying
>> the full contract is not needed and would be too much work).
>>
>> Alternatively we can introduce our own generics for that e.g.
>> abind1() and abind2(), or arbind() and acbind() (for "assay rbind"
>> and "assay cbind"). Advantages: the signatures would be cleaner,
>> the contracts simpler, and the methods easier to implement. Also
>> we wouldn't need to depend on the abind package.
>>
>> What do you think?
>>
>> H.
>>
>>>
>>> Cheers,
>>> Pete
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list