[Bioc-devel] Unexpected behaviour with Assays and Vector classes

Morgan, Martin Martin.Morgan at roswellpark.org
Fri Dec 4 01:15:41 CET 2015



> -----Original Message-----
> From: Bioc-devel [mailto:bioc-devel-bounces at r-project.org] On Behalf Of
> Hervé Pagès
> Sent: Sunday, November 15, 2015 2:54 PM
> To: Aaron Lun; bioc-devel at r-project.org
> Subject: Re: [Bioc-devel] Unexpected behaviour with Assays and Vector
> classes
> 
> Hi Aaron,
> 
> On 11/15/2015 10:59 AM, Aaron Lun wrote:
> > Hello all,
> >
> > I've encountered some unexpected behaviour with some of the base
> > classes while developing stuff for genomic interactions. The first
> > issue lies with the subset replacement in the Vector class. Let's say
> > I make a derived class "foo" inheriting from Vector, as below:
> >
> >  > require(S4Vectors)
> >  > setClass("foo", contains="Vector", slots=c(blah="integer"))  >
> > setMethod("parallelSlotNames", "foo", function(x) {
> >  >     c("blah", callNextMethod())
> >  > })
> > [1] "parallelSlotNames"
> >  > setMethod("c", "foo", function(x, ..., recursive=TRUE) {
> >  >    new.blah <- do.call(c, lapply(list(x, ...), FUN=slot, name="blah"))
> >  >    new.mcols <- do.call(rbind, lapply(list(x, ...), FUN=mcols))
> >  >    new("foo", blah=new.blah, metadata=metadata(x),
> >  >        elementMetadata=new.mcols)
> >  > })
> > [1] "c"
> >
> > Construction gives what you'd expect:
> >
> >  > a <- new("foo", blah=1:5, elementMetadata=DataFrame(stuff=1:5))
> >  > a at blah
> > [1] 1 2 3 4 5
> >  > mcols(a)$stuff
> > [1] 1 2 3 4 5
> >
> > However, if I try to do subset replacement, I get this:
> >
> >  > a[1] <- a[2]
> >  > a at blah
> > [1] 2 2 3 4 5
> >  > mcols(a)$stuff
> > [1] 1 2 3 4 5
> >
> > So, "blah" is replaced properly, but "elementMetadata" is not. This is
> > attributable to a line in "replaceROWS" which preserves the mcols of
> > the original object during replacement (also for "names"). Should this
> > line be removed to give expected behaviour for the elementMetadata?
> 
> For the treatment of the metadata columns we are usually mimicking how
> names are treated in base R:
> 
>  > x <- c(a=1, b=2, c=3, d=4)
>  > x[1] <- x[2]
>  > x
> a b c d
> 2 2 3 4
> 
> The names are not affected.
> 
> IRanges objects are following that model:
> 
>  > library(IRanges)
>  > ir <- IRanges(11:14, 20, names=letters[1:4])  > mcols(ir) <-
> DataFrame(stuff=1:4)  > ir IRanges of length 4
>      start end width names
> [1]    11  20    10     a
> [2]    12  20     9     b
> [3]    13  20     8     c
> [4]    14  20     7     d
>  > mcols(ir)
> DataFrame with 4 rows and 1 column
>        stuff
>    <integer>
> 1         1
> 2         2
> 3         3
> 4         4
>  > ir[1] <- ir[2]
>  > ir
> IRanges of length 4
>      start end width names
> [1]    12  20     9     a
> [2]    12  20     9     b
> [3]    13  20     8     c
> [4]    14  20     7     d
>  > mcols(ir)
> DataFrame with 4 rows and 1 column
>        stuff
>    <integer>
> 1         1
> 2         2
> 3         3
> 4         4
> 
> However it seems that GRanges objects are not:
> 
>  > gr <- GRanges("chr1", ir)
>  > gr
> GRanges object with 4 ranges and 1 metadata column:
>      seqnames    ranges strand |     stuff
>         <Rle> <IRanges>  <Rle> | <integer>
>    a     chr1  [12, 20]      * |         1
>    b     chr1  [12, 20]      * |         2
>    c     chr1  [13, 20]      * |         3
>    d     chr1  [14, 20]      * |         4
>    -------
>    seqinfo: 1 sequence from an unspecified genome; no seqlengths  > gr[1] <-
> gr[2]  > gr GRanges object with 4 ranges and 1 metadata column:
>      seqnames    ranges strand |     stuff
>         <Rle> <IRanges>  <Rle> | <integer>
>    a     chr1  [12, 20]      * |         2
>    b     chr1  [12, 20]      * |         2
>    c     chr1  [13, 20]      * |         3
>    d     chr1  [14, 20]      * |         4
>    -------
>    seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> So we have an inconsistency within our Vector-based classes.
> We need to fix that. It seems that you would have expected the metadata
> columns to be altered by [<-. Is this what most people feel?
> 
> >
> > The other issue is that r/cbind'ing doesn't seem to work properly for
> > unnamed multi-matrix Assays objects. Consider:
> >
> >  > require(SummarizedExperiment)
> >  > whee <- Assays(list(x1=matrix(1, 3, 4), x2=matrix(2, 3, 4)))  >
> > whee2 <- Assays(list(x1=matrix(3, 3, 4), x2=matrix(4, 3, 4)))  >
> > rbind(whee, whee2) Reference class object of class
> > "ShallowSimpleListAssays"
> > Field "data":
> > List of length 2
> > names(2): x1 x2
> >  >
> >  > names(whee) <- names(whee2) <- NULL  > rbind(whee, whee2)
> Reference
> > class object of class "ShallowSimpleListAssays"
> > Field "data":
> > List of length 1
> >
> > So, unnaming and rbind'ing results in the loss of a matrix. This is
> > the same issue I reported for unnamed multi-matrix assays when
> > rbinding multiple SummarizedExperiment objects; I recall that being
> > resolved by r/cbind'ing based on position. Should this be done here as
> > well? If not, perhaps we should force people to name their assays.
> 
> Maybe Martin or Val can chime in for this one.

Sorry for the late reply. This is addressed in devel 1.1.4 (svn now, biocLite() on Sunday, all being well) and will be fixed in release, likely as 1.0.2.

Martin

> 
> Thanks,
> H.
> 
> >
> > Cheers,
> >
> > Aaron
> >
> >  > sessionInfo()
> > R Under development (unstable) (2015-10-30 r69588)
> > Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release
> > 6.4 (Final)
> >
> > locale:
> >   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> >   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> >   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel  stats4    stats     graphics  grDevices utils     datasets
> > [8] methods   base
> >
> > other attached packages:
> > [1] SummarizedExperiment_1.1.2 Biobase_2.31.0
> > [3] GenomicRanges_1.23.3       GenomeInfoDb_1.7.3
> > [5] IRanges_2.5.5              S4Vectors_0.9.8
> > [7] BiocGenerics_0.17.1
> >
> > loaded via a namespace (and not attached):
> > [1] zlibbioc_1.17.0 XVector_0.11.0
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> --
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.


More information about the Bioc-devel mailing list