[Bioc-devel] Unexpected behaviour with Assays and Vector classes

Hervé Pagès hpages at fredhutch.org
Sun Nov 15 20:54:28 CET 2015


Hi Aaron,

On 11/15/2015 10:59 AM, Aaron Lun wrote:
> Hello all,
>
> I've encountered some unexpected behaviour with some of the base classes
> while developing stuff for genomic interactions. The first issue lies
> with the subset replacement in the Vector class. Let's say I make a
> derived class "foo" inheriting from Vector, as below:
>
>  > require(S4Vectors)
>  > setClass("foo", contains="Vector", slots=c(blah="integer"))
>  > setMethod("parallelSlotNames", "foo", function(x) {
>  >     c("blah", callNextMethod())
>  > })
> [1] "parallelSlotNames"
>  > setMethod("c", "foo", function(x, ..., recursive=TRUE) {
>  >    new.blah <- do.call(c, lapply(list(x, ...), FUN=slot, name="blah"))
>  >    new.mcols <- do.call(rbind, lapply(list(x, ...), FUN=mcols))
>  >    new("foo", blah=new.blah, metadata=metadata(x),
>  >        elementMetadata=new.mcols)
>  > })
> [1] "c"
>
> Construction gives what you'd expect:
>
>  > a <- new("foo", blah=1:5, elementMetadata=DataFrame(stuff=1:5))
>  > a at blah
> [1] 1 2 3 4 5
>  > mcols(a)$stuff
> [1] 1 2 3 4 5
>
> However, if I try to do subset replacement, I get this:
>
>  > a[1] <- a[2]
>  > a at blah
> [1] 2 2 3 4 5
>  > mcols(a)$stuff
> [1] 1 2 3 4 5
>
> So, "blah" is replaced properly, but "elementMetadata" is not. This is
> attributable to a line in "replaceROWS" which preserves the mcols of the
> original object during replacement (also for "names"). Should this line
> be removed to give expected behaviour for the elementMetadata?

For the treatment of the metadata columns we are usually mimicking
how names are treated in base R:

 > x <- c(a=1, b=2, c=3, d=4)
 > x[1] <- x[2]
 > x
a b c d
2 2 3 4

The names are not affected.

IRanges objects are following that model:

 > library(IRanges)
 > ir <- IRanges(11:14, 20, names=letters[1:4])
 > mcols(ir) <- DataFrame(stuff=1:4)
 > ir
IRanges of length 4
     start end width names
[1]    11  20    10     a
[2]    12  20     9     b
[3]    13  20     8     c
[4]    14  20     7     d
 > mcols(ir)
DataFrame with 4 rows and 1 column
       stuff
   <integer>
1         1
2         2
3         3
4         4
 > ir[1] <- ir[2]
 > ir
IRanges of length 4
     start end width names
[1]    12  20     9     a
[2]    12  20     9     b
[3]    13  20     8     c
[4]    14  20     7     d
 > mcols(ir)
DataFrame with 4 rows and 1 column
       stuff
   <integer>
1         1
2         2
3         3
4         4

However it seems that GRanges objects are not:

 > gr <- GRanges("chr1", ir)
 > gr
GRanges object with 4 ranges and 1 metadata column:
     seqnames    ranges strand |     stuff
        <Rle> <IRanges>  <Rle> | <integer>
   a     chr1  [12, 20]      * |         1
   b     chr1  [12, 20]      * |         2
   c     chr1  [13, 20]      * |         3
   d     chr1  [14, 20]      * |         4
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths
 > gr[1] <- gr[2]
 > gr
GRanges object with 4 ranges and 1 metadata column:
     seqnames    ranges strand |     stuff
        <Rle> <IRanges>  <Rle> | <integer>
   a     chr1  [12, 20]      * |         2
   b     chr1  [12, 20]      * |         2
   c     chr1  [13, 20]      * |         3
   d     chr1  [14, 20]      * |         4
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

So we have an inconsistency within our Vector-based classes.
We need to fix that. It seems that you would have expected the
metadata columns to be altered by [<-. Is this what most people feel?

>
> The other issue is that r/cbind'ing doesn't seem to work properly for
> unnamed multi-matrix Assays objects. Consider:
>
>  > require(SummarizedExperiment)
>  > whee <- Assays(list(x1=matrix(1, 3, 4), x2=matrix(2, 3, 4)))
>  > whee2 <- Assays(list(x1=matrix(3, 3, 4), x2=matrix(4, 3, 4)))
>  > rbind(whee, whee2)
> Reference class object of class "ShallowSimpleListAssays"
> Field "data":
> List of length 2
> names(2): x1 x2
>  >
>  > names(whee) <- names(whee2) <- NULL
>  > rbind(whee, whee2)
> Reference class object of class "ShallowSimpleListAssays"
> Field "data":
> List of length 1
>
> So, unnaming and rbind'ing results in the loss of a matrix. This is the
> same issue I reported for unnamed multi-matrix assays when rbinding
> multiple SummarizedExperiment objects; I recall that being resolved by
> r/cbind'ing based on position. Should this be done here as well? If not,
> perhaps we should force people to name their assays.

Maybe Martin or Val can chime in for this one.

Thanks,
H.

>
> Cheers,
>
> Aaron
>
>  > sessionInfo()
> R Under development (unstable) (2015-10-30 r69588)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: CentOS release 6.4 (Final)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats4    stats     graphics  grDevices utils     datasets
> [8] methods   base
>
> other attached packages:
> [1] SummarizedExperiment_1.1.2 Biobase_2.31.0
> [3] GenomicRanges_1.23.3       GenomeInfoDb_1.7.3
> [5] IRanges_2.5.5              S4Vectors_0.9.8
> [7] BiocGenerics_0.17.1
>
> loaded via a namespace (and not attached):
> [1] zlibbioc_1.17.0 XVector_0.11.0
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list