[Bioc-devel] Subsetting an RleList object

Hervé Pagès hpages at fhcrc.org
Wed Oct 30 01:55:47 CET 2013


Hi Michael,

In Bioc < 2.13, subsetting was a mess. In particular, handling of
list-like subscripts was rather unpredictable. It would work only
if you were lucky enough to try it with one of the few supported
types (like IntegerList, LogicalList, or IRangesList), but it didn't
work for other very natural types like list or CharacterList.
Or it would work for [ but not for [<-, or vice-versa:

   x <- splitAsList(letters[1:6], c(2, 4, 3, 2, 2, 4))
   x[list(1)]            # doesn't work in BioC < 2.13!
   x[list(1)] <- "XX"    # works in BioC < 2.13!

Or, if both [ and [<- worked, they could behave inconsistently: one
would require the list-like subscript to have the same length as 'x'
but the other wouldn't. Or one would use the names on the subscript
and on 'x' to map the list elements between the two, but the other
wouldn't.

Hopefully in BioC 2.13, subsetting behaves more consistently (at least
that was the intention). For example now the names on the subscript and
on 'x' are always used to map the list elements between the two:

   > x[list(`4`=2:1)]
   CharacterList of length 1
   [["4"]] f b

Also now, it's an error if the subscript has names but 'x' has not:

   > unname(x)[list(`4`=2:1)]
   Error in subsetListByList(x, i) :
     cannot subscript an unnamed list-like object by a named list-like 
object

(I should probably change this message for: "cannot subset an unnamed
list-like object by a named list-like subscript".)

This is to be consistent with subsetting a Vector object by name, which
fails if 'x' has no names:

   > IRanges(1:4, 5)["a"]
   Error in normalizeSingleBracketSubscript(i, x) :
     cannot subset by character when names are NULL

If the subscript is a list-like object with names, the assumption is
that the user intended those names to be mapped against 'x' names.
If 'x' doesn't have names, I think it should fail rather than silently
fall back to position-based mapping. So at least you give a chance
to the user to either put names on 'x' (maybe s/he just forgot) or to
remove them from the subscript. If we really want to fall back to
position-based mapping, at least it should issue a warning, I think.

One thing I didn't change from pre-BioC-2.13 behavior is that a
list-like subscript (when unnamed) is not recycled along 'x'. It's
open to discussion whether this would be a good thing to have or not.
Changing this would be pretty disruptive though...

Cheers,
H.


On 10/29/2013 03:51 PM, Michael Lawrence wrote:
> I think we should just drop the names for the user. The Bioc <2.13
> behavior seems reasonable to me. Please elaborate on the subtle issues.
> Most users would not expect the *names* on the index to have any effect
> on the extraction, in accordance with the behavior of ordinary vectors.
> The only difference with Lists is that there is a partitioning, which
> seems unrelated to naming.
>
> Michael
>
>
> On Tue, Oct 29, 2013 at 3:40 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     Hi Thomas,
>
>     For the same reasons that you cannot subset by names a Vector object
>     with no names:
>
>        > IRanges(1:4, width=10)[letters[1:4]]
>        Error in normalizeSingleBracketSubscrip__t(i, x) :
>          cannot subset by character when names are NULL
>
>     you cannot subset an unnamed List object using a named list-like
>     subscript. So in your case, just remove the names on 'keep_ranges'
>     (which are probably not desired anyway) before using it as a
>     subscript:
>
>
>        > keep_ranges
>        CompressedIRangesList of length 18
>        $`1`
>        IRanges of length 1
>            start end width
>        [1]    20 108    89
>
>        $`2`
>        IRanges of length 1
>            start end width
>        [1]    43 131    89
>
>        $`3`
>        IRanges of length 1
>            start end width
>        [1]    21 105    85
>
>        ...
>        <15 more elements>
>
>        > return_rles[ unname(keep_ranges) ]
>        RleList of length 18
>        [[1]]
>        logical-Rle of length 89 with 1 run
>          Lengths:   89
>          Values : TRUE
>
>        [[2]]
>        logical-Rle of length 89 with 1 run
>          Lengths:   89
>          Values : TRUE
>
>        [[3]]
>        logical-Rle of length 85 with 1 run
>          Lengths:   85
>          Values : TRUE
>
>        [[4]]
>        logical-Rle of length 85 with 1 run
>          Lengths:   85
>          Values : TRUE
>
>        [[5]]
>        logical-Rle of length 102 with 1 run
>          Lengths:  102
>          Values : TRUE
>
>        ...
>        <13 more elements>
>
>     Prior to BioC 2.13, it was possible to subset an unnamed List object by
>     a named list-like subscript, and in that case, the names on the
>     subscript were ignored and the subscript was treated as parallel to the
>     object to subset. However this behavior was somehow dangerous (could
>     lead to subtle issues) and didn't follow the spirit of what subsetting
>     an unnamed Vector by name does. So it's not supported anymore.
>
>     Sorry for the inconvenience,
>     H.
>
>
>
>     On 10/29/2013 03:05 PM, Thomas Sandmann wrote:
>
>         Hi Herve,
>
>         I have updated to IRanges 1.20.4 now, but unfortunately, I still
>         encounter an error when I try to subset a CompressedRleList or
>         SimpleRleList with a CompressedIRangesList or SimpleIRangesList.
>
>         Would you mind having a look at where I am going wrong ? (My two
>         example
>         objects are available in the rdata object at the url shown below).
>
>         con=url("http://dl.__dropboxusercontent.com/u/__126180/example.rdata
>         <http://dl.dropboxusercontent.com/u/126180/example.rdata>")
>         load( con )
>         return_rles[ keep_ranges ]
>
>         Error in subsetListByList(x, i) (from List-class.R#205) :
>             cannot subscript an unnamed list-like object by a named
>         list-like object
>
>         R version 3.0.2 (2013-09-25)
>         Platform: x86_64-unknown-linux-gnu (64-bit)
>
>         locale:
>            [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>            [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>            [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>            [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>            [9] LC_ADDRESS=C               LC_TELEPHONE=C
>         [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>         attached base packages:
>         [1] parallel  stats     graphics  grDevices utils     datasets
>           methods
>         [8] base
>
>         other attached packages:
>            [1] trimPrimers_1.3.0    Rsamtools_1.14.1     Biostrings_2.30.0
>            [4] GenomicRanges_1.14.2 XVector_0.2.0        IRanges_1.20.4
>            [7] BiocGenerics_0.8.0   Defaults_1.1-1
>         BiocInstaller_1.12.0
>         [10] roxygen2_2.2.2       digest_0.6.3         devtools_1.3
>
>         loaded via a namespace (and not attached):
>            [1] bitops_1.0-6   brew_1.0-6     compiler_3.0.2
>         evaluate_0.5.1 httr_0.2
>            [6] memoise_0.1    RCurl_1.95-4.1 stats4_3.0.2   stringr_0.6.2
>            tools_3.0.2
>         [11] whisker_0.3-2  zlibbioc_1.8.0
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>     _________________________________________________
>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>     <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list