[Bioc-devel] equivalent of seqselect,vector,Rle
Hervé Pagès
hpages at fhcrc.org
Sat Dec 14 06:58:37 CET 2013
Hi Michael,
On 12/13/2013 06:39 PM, Michael Lawrence wrote:
> Coercion might suffice. I do remember Patrick optimizing these
> selections with e.g. memcpy(), so they are pretty fast.
The memcpy() trick was used (and is still used in extractROWS) when
seqselect'ing by a Ranges object. For subsetting *by* an integer-Rle,
there was no (and there is still no) optimization: the subscript was
just passed thru as.integer() internally. Subsetting by a numeric-Rle
or character-Rle was broken.
> No profiling
> data though. I do have some performance critical code that has relied on
> the Rle-based extraction. Would be nice to avoid re-evaluating the
> performance.
From a performance point of view, there should be no significant
difference between doing
x[as.vector(i)]
and doing
IRanges:::extractROWS(x, i)
when 'i' is an Rle, because the latter passes 'i' thru as.vector()
internally (internal helper normalizeSingleBracketSubscript actually
does that). However I would still recommend you use the latter in
your package so it will take advantage of optimizations that might
happen in the future.
H.
>
>
> On Fri, Dec 13, 2013 at 6:19 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> On 12/13/2013 01:49 PM, Michael Lawrence wrote:
>
> Thanks, makes sense. Didn't realize we could dispatch on the 'i'
> parameter. I sort of recall the perception that we couldn't, and
> that
> was one of the main motivations behind seqselect. But it does appear
> possible.
>
>
> Well I was hoping I could do this but it doesn't work :-/
>
> Found in the man page for `[`:
>
> S4 methods:
>
> These operators are also implicit S4 generics, but as primitives,
> S4 methods will be dispatched only on S4 objects ‘x’.
>
> OK, fair enough. But the following is really misleading:
>
> > library(IRanges)
>
> > `[`
> .Primitive("[")
>
> > getGeneric("[")
> standardGeneric for "[" defined from package "base"
>
> function (x, i, j, ..., drop = TRUE)
> standardGeneric("[", .Primitive("["))
> <bytecode: 0x168cba0>
> <environment: 0x1ccfd90>
> Methods may be defined for arguments: x, i, j, drop
> Use showMethods("[") for currently available ones.
>
> So the implicit generic actually does dispatch on 'i'.
>
> I can see my new [,vector,Ranges method:
>
> > selectMethod("[", c("vector", "Ranges"))
> Method Definition:
>
> function (x, i, j, ..., drop = TRUE)
> {
> if (!missing(j) || length(list(...)) > 0L)
> stop("invalid subsetting")
> extractROWS(x, i)
> }
> <environment: namespace:IRanges>
>
> Signatures:
> x i
> target "vector" "Ranges"
> defined "vector" "Ranges"
>
> And dispatch works if I explicitly call the generic:
>
> > getGeneric("[")(letters, IRanges(4, 8))
> [1] "d" "e" "f" "g" "h"
>
> but not if I call the primitive:
>
> > letters[IRanges(4, 8)]
> Error in letters[IRanges(4, 8)] : invalid subscript type 'S4'
>
> Seems like the primitive first checks 'x' and only if it's an
> S4 object it then delegates to the implicit S4 generic. Probably
> for performance reasons as it avoids the cost of having to perform
> full multiple dispatch when 'x' is an ordinary objects.
>
> The following hack works:
>
> > `[` <- getGeneric("[")
> > letters[IRanges(4, 8)]
> [1] "d" "e" "f" "g" "h"
>
> but putting this in IRanges feels wrong (I tried and it caused
> troubles with ref classes).
>
> So I guess I should go ahead and export/document extractROWS()
> and replaceROWS(). What are the other options?
>
> In the mean time of course you can always pass your Ranges or Rle
> subscript thru unlist() or as.vector() first (not much more typing
> than doing seqselect() and I don't expect this will impact performance
> too much in practise).
>
> H.
>
>
> Michael
>
>
> On Fri, Dec 13, 2013 at 1:10 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
> Hi Michael,
>
>
> On 12/13/2013 01:03 PM, Michael Lawrence wrote:
>
> I used to use seqselect for subsetting ordinary R
> vectors by
> Ranges and
> Rle. IRanges:::extractROWS does this, but it's hidden
> behind the
> namespace.
> What is the public way of doing this?
>
> Maybe we just need to export extractROWS()? Or
> something with a
> better name?
>
>
> I'll add [,vector,Ranges and [,vector,Rle methods (and
> probably also
> [,factor,Ranges and [,factor,Rle). They'll just be wrappers to
> IRanges:::extractROWS which I'd like to keep hidden.
>
> Was not sure people where doing this on ordinary R vectors
> so was
> waiting for someone to speak up.
>
> H.
>
>
> Michael
>
> [[alternative HTML version deleted]]
>
> ___________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>>
> mailing list
> https://stat.ethz.ch/mailman/____listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list