[Rd] In C, a fast way to slice a vector?
Patrick Aboyoun
paboyoun at fhcrc.org
Mon May 11 06:37:42 CEST 2009
Saptarshi,
I know of two alternatives you can use to do fast extraction of
consecutive subsequences of a vector:
1) Fast copy: The method you mentioned of creating a memcpy'd vector
2) Pointer management: Creating an externalptr object in R and manage
the start and end of your data
If you are looking for a prototyping environment to try, I recommend
using the IRanges and Biostrings packages from the Bioconductor
project. The IRanges package contains a function called subseq for
performing 1) on all basic vector types (raw, logical, integer, etc.)
and Biostrings package contains a subseq method on an externalptr
based class that implements 2.
I was going to lobby R core members quietly about adding something
akin to subseq from IRanges into base R since it is extremely useful
for all long vectors and could replace all a:b calls with a <= b in R
code, but this publicity can't hurt.
Here is an example:
> source("http://bioconductor.org/biocLite.R")
> biocLite(c("IRanges", "Biostrings"))
<< download output omitted >>
> suppressMessages(library(Biostrings))
> x <- rep(charToRaw("a"), 1e7)
> y <- BString(rawToChar(x))
> suppressMessages(library(Biostrings))
> x <- rep(charToRaw("a"), 1e7)
> y <- BString(rawToChar(x))
> system.time(x[13:1e7])
user system elapsed
0.304 0.073 0.378
> system.time(subseq(x, 13))
user system elapsed
0.011 0.007 0.019
> system.time(subseq(y, 13))
user system elapsed
0.003 0.000 0.004
> identical(x[13:1e7], subseq(x, 13))
[1] TRUE
> identical(x[13:1e7], charToRaw(as.character(subseq(y, 13))))
[1] TRUE
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-05-08 r48504)
i386-apple-darwin9.6.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.13.5 IRanges_1.3.5
loaded via a namespace (and not attached):
[1] Biobase_2.5.2
Quoting Saptarshi Guha <saptarshi.guha at gmail.com>:
> Hello,
> Suppose in the following code,
> PROTECT(sr = R_tryEval( .... ))
>
> sr is a RAWSXP vector. I wish to return another RAWSXP starting at
> position 13 onwards (base=0).
>
> I could create another RAWSXP of the correct length and then memcpy
> the required bytes and length to this new one.
>
> However is there a more efficient method?
>
> Regards
> Saptarshi Guha
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list