[Bioc-devel] Unexpected behaviour with Assays and Vector classes
Aaron Lun
alun at wehi.edu.au
Sun Nov 15 19:59:44 CET 2015
Hello all,
I've encountered some unexpected behaviour with some of the base classes
while developing stuff for genomic interactions. The first issue lies
with the subset replacement in the Vector class. Let's say I make a
derived class "foo" inheriting from Vector, as below:
> require(S4Vectors)
> setClass("foo", contains="Vector", slots=c(blah="integer"))
> setMethod("parallelSlotNames", "foo", function(x) {
> c("blah", callNextMethod())
> })
[1] "parallelSlotNames"
> setMethod("c", "foo", function(x, ..., recursive=TRUE) {
> new.blah <- do.call(c, lapply(list(x, ...), FUN=slot, name="blah"))
> new.mcols <- do.call(rbind, lapply(list(x, ...), FUN=mcols))
> new("foo", blah=new.blah, metadata=metadata(x),
> elementMetadata=new.mcols)
> })
[1] "c"
Construction gives what you'd expect:
> a <- new("foo", blah=1:5, elementMetadata=DataFrame(stuff=1:5))
> a at blah
[1] 1 2 3 4 5
> mcols(a)$stuff
[1] 1 2 3 4 5
However, if I try to do subset replacement, I get this:
> a[1] <- a[2]
> a at blah
[1] 2 2 3 4 5
> mcols(a)$stuff
[1] 1 2 3 4 5
So, "blah" is replaced properly, but "elementMetadata" is not. This is
attributable to a line in "replaceROWS" which preserves the mcols of the
original object during replacement (also for "names"). Should this line
be removed to give expected behaviour for the elementMetadata?
The other issue is that r/cbind'ing doesn't seem to work properly for
unnamed multi-matrix Assays objects. Consider:
> require(SummarizedExperiment)
> whee <- Assays(list(x1=matrix(1, 3, 4), x2=matrix(2, 3, 4)))
> whee2 <- Assays(list(x1=matrix(3, 3, 4), x2=matrix(4, 3, 4)))
> rbind(whee, whee2)
Reference class object of class "ShallowSimpleListAssays"
Field "data":
List of length 2
names(2): x1 x2
>
> names(whee) <- names(whee2) <- NULL
> rbind(whee, whee2)
Reference class object of class "ShallowSimpleListAssays"
Field "data":
List of length 1
So, unnaming and rbind'ing results in the loss of a matrix. This is the
same issue I reported for unnamed multi-matrix assays when rbinding
multiple SummarizedExperiment objects; I recall that being resolved by
r/cbind'ing based on position. Should this be done here as well? If not,
perhaps we should force people to name their assays.
Cheers,
Aaron
> sessionInfo()
R Under development (unstable) (2015-10-30 r69588)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.4 (Final)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] SummarizedExperiment_1.1.2 Biobase_2.31.0
[3] GenomicRanges_1.23.3 GenomeInfoDb_1.7.3
[5] IRanges_2.5.5 S4Vectors_0.9.8
[7] BiocGenerics_0.17.1
loaded via a namespace (and not attached):
[1] zlibbioc_1.17.0 XVector_0.11.0
More information about the Bioc-devel
mailing list