[Bioc-devel] Unexpected behaviour with Assays and Vector classes
Hervé Pagès
hpages at fredhutch.org
Sun Nov 15 20:54:28 CET 2015
Hi Aaron,
On 11/15/2015 10:59 AM, Aaron Lun wrote:
> Hello all,
>
> I've encountered some unexpected behaviour with some of the base classes
> while developing stuff for genomic interactions. The first issue lies
> with the subset replacement in the Vector class. Let's say I make a
> derived class "foo" inheriting from Vector, as below:
>
> > require(S4Vectors)
> > setClass("foo", contains="Vector", slots=c(blah="integer"))
> > setMethod("parallelSlotNames", "foo", function(x) {
> > c("blah", callNextMethod())
> > })
> [1] "parallelSlotNames"
> > setMethod("c", "foo", function(x, ..., recursive=TRUE) {
> > new.blah <- do.call(c, lapply(list(x, ...), FUN=slot, name="blah"))
> > new.mcols <- do.call(rbind, lapply(list(x, ...), FUN=mcols))
> > new("foo", blah=new.blah, metadata=metadata(x),
> > elementMetadata=new.mcols)
> > })
> [1] "c"
>
> Construction gives what you'd expect:
>
> > a <- new("foo", blah=1:5, elementMetadata=DataFrame(stuff=1:5))
> > a at blah
> [1] 1 2 3 4 5
> > mcols(a)$stuff
> [1] 1 2 3 4 5
>
> However, if I try to do subset replacement, I get this:
>
> > a[1] <- a[2]
> > a at blah
> [1] 2 2 3 4 5
> > mcols(a)$stuff
> [1] 1 2 3 4 5
>
> So, "blah" is replaced properly, but "elementMetadata" is not. This is
> attributable to a line in "replaceROWS" which preserves the mcols of the
> original object during replacement (also for "names"). Should this line
> be removed to give expected behaviour for the elementMetadata?
For the treatment of the metadata columns we are usually mimicking
how names are treated in base R:
> x <- c(a=1, b=2, c=3, d=4)
> x[1] <- x[2]
> x
a b c d
2 2 3 4
The names are not affected.
IRanges objects are following that model:
> library(IRanges)
> ir <- IRanges(11:14, 20, names=letters[1:4])
> mcols(ir) <- DataFrame(stuff=1:4)
> ir
IRanges of length 4
start end width names
[1] 11 20 10 a
[2] 12 20 9 b
[3] 13 20 8 c
[4] 14 20 7 d
> mcols(ir)
DataFrame with 4 rows and 1 column
stuff
<integer>
1 1
2 2
3 3
4 4
> ir[1] <- ir[2]
> ir
IRanges of length 4
start end width names
[1] 12 20 9 a
[2] 12 20 9 b
[3] 13 20 8 c
[4] 14 20 7 d
> mcols(ir)
DataFrame with 4 rows and 1 column
stuff
<integer>
1 1
2 2
3 3
4 4
However it seems that GRanges objects are not:
> gr <- GRanges("chr1", ir)
> gr
GRanges object with 4 ranges and 1 metadata column:
seqnames ranges strand | stuff
<Rle> <IRanges> <Rle> | <integer>
a chr1 [12, 20] * | 1
b chr1 [12, 20] * | 2
c chr1 [13, 20] * | 3
d chr1 [14, 20] * | 4
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> gr[1] <- gr[2]
> gr
GRanges object with 4 ranges and 1 metadata column:
seqnames ranges strand | stuff
<Rle> <IRanges> <Rle> | <integer>
a chr1 [12, 20] * | 2
b chr1 [12, 20] * | 2
c chr1 [13, 20] * | 3
d chr1 [14, 20] * | 4
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
So we have an inconsistency within our Vector-based classes.
We need to fix that. It seems that you would have expected the
metadata columns to be altered by [<-. Is this what most people feel?
>
> The other issue is that r/cbind'ing doesn't seem to work properly for
> unnamed multi-matrix Assays objects. Consider:
>
> > require(SummarizedExperiment)
> > whee <- Assays(list(x1=matrix(1, 3, 4), x2=matrix(2, 3, 4)))
> > whee2 <- Assays(list(x1=matrix(3, 3, 4), x2=matrix(4, 3, 4)))
> > rbind(whee, whee2)
> Reference class object of class "ShallowSimpleListAssays"
> Field "data":
> List of length 2
> names(2): x1 x2
> >
> > names(whee) <- names(whee2) <- NULL
> > rbind(whee, whee2)
> Reference class object of class "ShallowSimpleListAssays"
> Field "data":
> List of length 1
>
> So, unnaming and rbind'ing results in the loss of a matrix. This is the
> same issue I reported for unnamed multi-matrix assays when rbinding
> multiple SummarizedExperiment objects; I recall that being resolved by
> r/cbind'ing based on position. Should this be done here as well? If not,
> perhaps we should force people to name their assays.
Maybe Martin or Val can chime in for this one.
Thanks,
H.
>
> Cheers,
>
> Aaron
>
> > sessionInfo()
> R Under development (unstable) (2015-10-30 r69588)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: CentOS release 6.4 (Final)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats4 stats graphics grDevices utils datasets
> [8] methods base
>
> other attached packages:
> [1] SummarizedExperiment_1.1.2 Biobase_2.31.0
> [3] GenomicRanges_1.23.3 GenomeInfoDb_1.7.3
> [5] IRanges_2.5.5 S4Vectors_0.9.8
> [7] BiocGenerics_0.17.1
>
> loaded via a namespace (and not attached):
> [1] zlibbioc_1.17.0 XVector_0.11.0
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list