[Bioc-devel] Unexpected behaviour with Assays and Vector classes
Hervé Pagès
hpages at fredhutch.org
Tue Nov 17 08:39:50 CET 2015
Hi Aaron,
On 11/16/2015 09:39 AM, Hervé Pagès wrote:
> Hi Aaron,
>
> On 11/15/2015 12:37 PM, Aaron Lun wrote:
>> Hi Herve,
>>
>> I would have expected GRanges behaviour, where the metadata is affected
>> by the replacement. For example, with GRanges objects, I often use the
>> metadata to store statistics or descriptors relevant to each genomic
>> interval, e.g., peak scores, log-fold changes and so on. If I did a
>> replacement to change the genomic coordinates, I would expect that the
>> associated metadata would come along for the ride. The same argument
>> would be applied to other classes derived from Vector.
>>
>> I should add that I encountered this problem in the context of doing
>> subset replacement for metadata. To re-use your IRanges example:
>>
>>> require(IRanges)
>>> ir <- IRanges(11:14, 20, names=letters[1:4])
>>> mcols(ir) <- DataFrame(stuff=1:4)
>>> mcols(ir[1])$stuff <- 2
>>> mcols(ir)$stuff
>> [1] 1 2 3 4
>>
>> I would expect that the first entry of "stuff" would now be 2. But
>> because metadata doesn't get replaced during subset replacement, it
>> remains at 1. This is problematic, because if I wanted to update a
>> subset of entries in the metadata, I would use something similar to the
>> above replacement and expect it to change the existing object.
>
> Good point. The same argument could actually be used for the names:
>
> > x <- c(a=1, b=2, c=3, d=4)
> > names(x[1]) <- "ZZ"
> > x
> a b c d
> 1 2 3 4
>
> Anyway, you convinced me that we should not try to follow too closely
> the names model for how we treat metadata columns. If nobody objects,
> I'll change the behavior of the various "replaceROWS" methods in the
> S4Vectors stack and make sure that they all transfer the metadata
> columns from 'value' to 'x'.
This is done in S4Vectors 0.9.9 and IRanges 2.5.6 in devel, and in
S4Vectors 0.8.3 and IRanges 2.4.2 in release.
Cheers,
H.
>
> Cheers,
> H.
>
>>
>> Cheers,
>>
>> Aaron
>>
>> Hervé Pagès wrote:
>>> Hi Aaron,
>>>
>>> On 11/15/2015 10:59 AM, Aaron Lun wrote:
>>>> Hello all,
>>>>
>>>> I've encountered some unexpected behaviour with some of the base
>>>> classes
>>>> while developing stuff for genomic interactions. The first issue lies
>>>> with the subset replacement in the Vector class. Let's say I make a
>>>> derived class "foo" inheriting from Vector, as below:
>>>>
>>>> > require(S4Vectors)
>>>> > setClass("foo", contains="Vector", slots=c(blah="integer"))
>>>> > setMethod("parallelSlotNames", "foo", function(x) {
>>>> > c("blah", callNextMethod())
>>>> > })
>>>> [1] "parallelSlotNames"
>>>> > setMethod("c", "foo", function(x, ..., recursive=TRUE) {
>>>> > new.blah <- do.call(c, lapply(list(x, ...), FUN=slot,
>>>> name="blah"))
>>>> > new.mcols <- do.call(rbind, lapply(list(x, ...), FUN=mcols))
>>>> > new("foo", blah=new.blah, metadata=metadata(x),
>>>> > elementMetadata=new.mcols)
>>>> > })
>>>> [1] "c"
>>>>
>>>> Construction gives what you'd expect:
>>>>
>>>> > a <- new("foo", blah=1:5, elementMetadata=DataFrame(stuff=1:5))
>>>> > a at blah
>>>> [1] 1 2 3 4 5
>>>> > mcols(a)$stuff
>>>> [1] 1 2 3 4 5
>>>>
>>>> However, if I try to do subset replacement, I get this:
>>>>
>>>> > a[1] <- a[2]
>>>> > a at blah
>>>> [1] 2 2 3 4 5
>>>> > mcols(a)$stuff
>>>> [1] 1 2 3 4 5
>>>>
>>>> So, "blah" is replaced properly, but "elementMetadata" is not. This is
>>>> attributable to a line in "replaceROWS" which preserves the mcols of
>>>> the
>>>> original object during replacement (also for "names"). Should this line
>>>> be removed to give expected behaviour for the elementMetadata?
>>>
>>> For the treatment of the metadata columns we are usually mimicking
>>> how names are treated in base R:
>>>
>>>> x <- c(a=1, b=2, c=3, d=4)
>>>> x[1] <- x[2]
>>>> x
>>> a b c d
>>> 2 2 3 4
>>>
>>> The names are not affected.
>>>
>>> IRanges objects are following that model:
>>>
>>>> library(IRanges)
>>>> ir <- IRanges(11:14, 20, names=letters[1:4])
>>>> mcols(ir) <- DataFrame(stuff=1:4)
>>>> ir
>>> IRanges of length 4
>>> start end width names
>>> [1] 11 20 10 a
>>> [2] 12 20 9 b
>>> [3] 13 20 8 c
>>> [4] 14 20 7 d
>>>> mcols(ir)
>>> DataFrame with 4 rows and 1 column
>>> stuff
>>> <integer>
>>> 1 1
>>> 2 2
>>> 3 3
>>> 4 4
>>>> ir[1] <- ir[2]
>>>> ir
>>> IRanges of length 4
>>> start end width names
>>> [1] 12 20 9 a
>>> [2] 12 20 9 b
>>> [3] 13 20 8 c
>>> [4] 14 20 7 d
>>>> mcols(ir)
>>> DataFrame with 4 rows and 1 column
>>> stuff
>>> <integer>
>>> 1 1
>>> 2 2
>>> 3 3
>>> 4 4
>>>
>>> However it seems that GRanges objects are not:
>>>
>>>> gr <- GRanges("chr1", ir)
>>>> gr
>>> GRanges object with 4 ranges and 1 metadata column:
>>> seqnames ranges strand | stuff
>>> <Rle> <IRanges> <Rle> | <integer>
>>> a chr1 [12, 20] * | 1
>>> b chr1 [12, 20] * | 2
>>> c chr1 [13, 20] * | 3
>>> d chr1 [14, 20] * | 4
>>> -------
>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>> gr[1] <- gr[2]
>>>> gr
>>> GRanges object with 4 ranges and 1 metadata column:
>>> seqnames ranges strand | stuff
>>> <Rle> <IRanges> <Rle> | <integer>
>>> a chr1 [12, 20] * | 2
>>> b chr1 [12, 20] * | 2
>>> c chr1 [13, 20] * | 3
>>> d chr1 [14, 20] * | 4
>>> -------
>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>
>>> So we have an inconsistency within our Vector-based classes.
>>> We need to fix that. It seems that you would have expected the
>>> metadata columns to be altered by [<-. Is this what most people feel?
>>>
>>>>
>>>> The other issue is that r/cbind'ing doesn't seem to work properly for
>>>> unnamed multi-matrix Assays objects. Consider:
>>>>
>>>> > require(SummarizedExperiment)
>>>> > whee <- Assays(list(x1=matrix(1, 3, 4), x2=matrix(2, 3, 4)))
>>>> > whee2 <- Assays(list(x1=matrix(3, 3, 4), x2=matrix(4, 3, 4)))
>>>> > rbind(whee, whee2)
>>>> Reference class object of class "ShallowSimpleListAssays"
>>>> Field "data":
>>>> List of length 2
>>>> names(2): x1 x2
>>>> >
>>>> > names(whee) <- names(whee2) <- NULL
>>>> > rbind(whee, whee2)
>>>> Reference class object of class "ShallowSimpleListAssays"
>>>> Field "data":
>>>> List of length 1
>>>>
>>>> So, unnaming and rbind'ing results in the loss of a matrix. This is the
>>>> same issue I reported for unnamed multi-matrix assays when rbinding
>>>> multiple SummarizedExperiment objects; I recall that being resolved by
>>>> r/cbind'ing based on position. Should this be done here as well? If
>>>> not,
>>>> perhaps we should force people to name their assays.
>>>
>>> Maybe Martin or Val can chime in for this one.
>>>
>>> Thanks,
>>> H.
>>>
>>>>
>>>> Cheers,
>>>>
>>>> Aaron
>>>>
>>>> > sessionInfo()
>>>> R Under development (unstable) (2015-10-30 r69588)
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>> Running under: CentOS release 6.4 (Final)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] parallel stats4 stats graphics grDevices utils
>>>> datasets
>>>> [8] methods base
>>>>
>>>> other attached packages:
>>>> [1] SummarizedExperiment_1.1.2 Biobase_2.31.0
>>>> [3] GenomicRanges_1.23.3 GenomeInfoDb_1.7.3
>>>> [5] IRanges_2.5.5 S4Vectors_0.9.8
>>>> [7] BiocGenerics_0.17.1
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] zlibbioc_1.17.0 XVector_0.11.0
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list