[R] Need very fast application of 'diff' - ideas?

R. Michael Weylandt michael.weylandt at gmail.com
Sat Jan 28 08:53:53 CET 2012


I'd write your own diff() that eliminates the method dispatch and
argument checking that diff -> diff.default does.

x[-1] - x[-len(x)] # is all you really need.
(# you could also try something like c(x[-1], NA) - x which may be
marginally faster as it only subsets x once but you should profile to
find out)

is probably about as fast as you can get within pure R code (the
function overhead will add a little bit of time as well, so if speed
is truly the only thing that matters, best not to use it. If you wanna
go for even more speed, you'll have to go to compiled code; I'd
suggest inline+Rcpp as the easiest way to do so. That could get it
down to a single pass through the vector in pure C (or nice C++) which
seems to be a lower bound for speed.

Michael

On Fri, Jan 27, 2012 at 7:15 PM, Kevin Ummel <kevinummel at gmail.com> wrote:
> Hi everyone,
>
> Speed is the key here.
>
> I need to find the difference between a vector and its one-period lag (i.e. the difference between each value and the subsequent one in the vector). Let's say the vector contains 10 million random integers between 0 and 1,000. The solution vector will have 9,999,999 values, since their is no lag for the 1st observation.
>
> In R we have:
>
> #Set up input vector
> x = runif(n=10e6, min=0, max=1000)
> x = round(x)
>
> #Find one-period difference
> y = diff(x)
>
> Question is: How can I get the 'diff(x)' part as fast as absolutely possible? I queried some colleagues who work with other languages, and they provided equivalent solutions in Python and Clojure that, on their machines, appear to be potentially much faster (I've put the code below in case anyone is interested). However, they mentioned that the overhead in passing the data between languages could kill any improvements. I don't have much experience integrating other languages, so I'm hoping the community has some ideas about how to approach this particular problem...
>
> Many thanks,
> Kevin
>
> In iPython:
>
> In [3]: import numpy as np
> In [4]: arr = np.random.randint(0, 1000, (10000000,1)).astype("int16")
> In [5]: arr1 = arr[1:].view()
> In [6]: timeit arr2 = arr1 - arr[:-1]
> 10 loops, best of 3: 20.1 ms per loop
>
> In Clojure:
>
> (defn subtract-lag
>  [n]
>  (let [v (take n (repeatedly rand))]
>    (time (dorun (map - v (cons 0 v))))))
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list