[Bioc-devel] equivalent of seqselect,vector,Rle

Hervé Pagès hpages at fhcrc.org
Sat Dec 14 06:58:37 CET 2013

Hi Michael,

On 12/13/2013 06:39 PM, Michael Lawrence wrote:
> Coercion might suffice. I do remember Patrick optimizing these
> selections with e.g. memcpy(), so they are pretty fast.

The memcpy() trick was used (and is still used in extractROWS) when
seqselect'ing by a Ranges object. For subsetting *by* an integer-Rle,
there was no (and there is still no) optimization: the subscript was
just passed thru as.integer() internally. Subsetting by a numeric-Rle
or character-Rle was broken.

> No profiling
> data though. I do have some performance critical code that has relied on
> the Rle-based extraction. Would be nice to avoid re-evaluating the
> performance.

 From a performance point of view, there should be no significant
difference between doing


and doing

   IRanges:::extractROWS(x, i)

when 'i' is an Rle, because the latter passes 'i' thru as.vector()
internally (internal helper normalizeSingleBracketSubscript actually
does that). However I would still recommend you use the latter in
your package so it will take advantage of optimizations that might
happen in the future.


> On Fri, Dec 13, 2013 at 6:19 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>     On 12/13/2013 01:49 PM, Michael Lawrence wrote:
>         Thanks, makes sense. Didn't realize we could dispatch on the 'i'
>         parameter. I sort of recall the perception that we couldn't, and
>         that
>         was one of the main motivations behind seqselect. But it does appear
>         possible.
>     Well I was hoping I could do this but it doesn't work :-/
>     Found in the man page for `[`:
>         S4 methods:
>           These operators are also implicit S4 generics, but as primitives,
>           S4 methods will be dispatched only on S4 objects ‘x’.
>     OK, fair enough. But the following is really misleading:
>        > library(IRanges)
>        > `[`
>        .Primitive("[")
>        > getGeneric("[")
>        standardGeneric for "[" defined from package "base"
>        function (x, i, j, ..., drop = TRUE)
>     standardGeneric("[", .Primitive("["))
>        <bytecode: 0x168cba0>
>        <environment: 0x1ccfd90>
>        Methods may be defined for arguments: x, i, j, drop
>        Use  showMethods("[")  for currently available ones.
>     So the implicit generic actually does dispatch on 'i'.
>     I can see my new [,vector,Ranges method:
>        > selectMethod("[", c("vector", "Ranges"))
>        Method Definition:
>        function (x, i, j, ..., drop = TRUE)
>        {
>          if (!missing(j) || length(list(...)) > 0L)
>              stop("invalid subsetting")
>          extractROWS(x, i)
>        }
>        <environment: namespace:IRanges>
>        Signatures:
>                x        i
>        target  "vector" "Ranges"
>        defined "vector" "Ranges"
>     And dispatch works if I explicitly call the generic:
>        > getGeneric("[")(letters, IRanges(4, 8))
>        [1] "d" "e" "f" "g" "h"
>     but not if I call the primitive:
>        > letters[IRanges(4, 8)]
>        Error in letters[IRanges(4, 8)] : invalid subscript type 'S4'
>     Seems like the primitive first checks 'x' and only if it's an
>     S4 object it then delegates to the implicit S4 generic. Probably
>     for performance reasons as it avoids the cost of having to perform
>     full multiple dispatch when 'x' is an ordinary objects.
>     The following hack works:
>        > `[` <- getGeneric("[")
>        > letters[IRanges(4, 8)]
>        [1] "d" "e" "f" "g" "h"
>     but putting this in IRanges feels wrong (I tried and it caused
>     troubles with ref classes).
>     So I guess I should go ahead and export/document extractROWS()
>     and replaceROWS(). What are the other options?
>     In the mean time of course you can always pass your Ranges or Rle
>     subscript thru unlist() or as.vector() first (not much more typing
>     than doing seqselect() and I don't expect this will impact performance
>     too much in practise).
>     H.
>         Michael
>         On Fri, Dec 13, 2013 at 1:10 PM, Hervé Pagès <hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>              Hi Michael,
>              On 12/13/2013 01:03 PM, Michael Lawrence wrote:
>                  I used to use seqselect for subsetting ordinary R
>         vectors by
>                  Ranges and
>                  Rle. IRanges:::extractROWS does this, but it's hidden
>         behind the
>                  namespace.
>                  What is the public way of doing this?
>                  Maybe we just need to export extractROWS()? Or
>         something with a
>                  better name?
>              I'll add [,vector,Ranges and [,vector,Rle methods (and
>         probably also
>              [,factor,Ranges and [,factor,Rle). They'll just be wrappers to
>              IRanges:::extractROWS which I'd like to keep hidden.
>              Was not sure people where doing this on ordinary R vectors
>         so was
>              waiting for someone to speak up.
>              H.
>                  Michael
>                           [[alternative HTML version deleted]]
>                  ___________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>
>                  mailing list
>         https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>                  <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>              --
>              Hervé Pagès
>              Program in Computational Biology
>              Division of Public Health Sciences
>              Fred Hutchinson Cancer Research Center
>              1100 Fairview Ave. N, M1-B514
>              P.O. Box 19024
>              Seattle, WA 98109-1024
>              E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>              Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>         <tel:%28206%29%20667-5791>
>              Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>     --
>     Hervé Pagès
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the Bioc-devel mailing list