[Rd] In C, a fast way to slice a vector?

Laurent Gautier lgautier at gmail.com
Tue May 12 13:19:43 CEST 2009



r-devel-request at r-project.org wrote:
> 
> Impressive stuff. Nice to see people giving some though to this.
> I will explore the packages you mentioned.
> 
> Thank you
> 
> Saptarshi Guha
> 
> 
> 
> On Mon, May 11, 2009 at 12:37 AM, Patrick Aboyoun <paboyoun at fhcrc.org> wrote:
>> Saptarshi,
>> I know of two alternatives you can use to do fast extraction of consecutive
>> subsequences of a vector:
>>
>> 1) Fast copy: ?The method you mentioned of creating a memcpy'd vector
>> 2) Pointer management: Creating an externalptr object in R and manage the
>> start and end of your data
>>
>> If you are looking for a prototyping environment to try, I recommend using
>> the IRanges and Biostrings packages from the Bioconductor project. The
>> IRanges package contains a function called subseq for performing 1) on all
>> basic vector types (raw, logical, integer, etc.) and Biostrings package
>> contains a subseq method on an externalptr based class that implements 2.
>>
>> I was going to lobby R core members quietly about adding something akin to
>> subseq from IRanges into base R since it is extremely useful for all long
>> vectors and could replace all a:b calls with a <= b in R code, but this
>> publicity can't hurt.


The Python development team has been developing something similar for 
python 3.0 (Buffer and Memoryview), and they are backporting it to the 
latest 2.x releases.
I have just started toying with it, and it seems looking very nice. 
There might be good ideas to take from there into a possible R built-in 
capability.



L.




>> Here is an example:
>>
>>> source("http://bioconductor.org/biocLite.R")
>>> biocLite(c("IRanges", "Biostrings"))
>> << download output omitted >>
>>> suppressMessages(library(Biostrings))
>>> x <- rep(charToRaw("a"), 1e7)
>>> y <- BString(rawToChar(x))
>>> suppressMessages(library(Biostrings))
>>> x <- rep(charToRaw("a"), 1e7)
>>> y <- BString(rawToChar(x))
>>> system.time(x[13:1e7])
>> ? user ?system elapsed
>> ?0.304 ? 0.073 ? 0.378
>>> system.time(subseq(x, 13))
>> ? user ?system elapsed
>> ?0.011 ? 0.007 ? 0.019
>>> system.time(subseq(y, 13))
>> ? user ?system elapsed
>> ?0.003 ? 0.000 ? 0.004
>>> identical(x[13:1e7], subseq(x, 13))
>> [1] TRUE
>>> identical(x[13:1e7], charToRaw(as.character(subseq(y, 13))))
>> [1] TRUE
>>> sessionInfo()
>> R version 2.10.0 Under development (unstable) (2009-05-08 r48504)
>> i386-apple-darwin9.6.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
>>
>> other attached packages:
>> [1] Biostrings_2.13.5 IRanges_1.3.5
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.5.2
>>
>>
>>
>> Quoting Saptarshi Guha <saptarshi.guha at gmail.com>:
>>
>>> Hello,
>>> Suppose in the following code,
>>> PROTECT(sr = R_tryEval( .... ))
>>>
>>> sr is a RAWSXP vector. I wish to return another RAWSXP starting at
>>> position 13 onwards (base=0).
>>>
>>> I could create another RAWSXP of the correct length and then memcpy
>>> the required bytes and length to this new one.
>>>
>>> However is there a more efficient method?
>>>
>>> Regards
>>> Saptarshi Guha
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>



More information about the R-devel mailing list