[R] speed of a vector operation question

Fri Apr 26 22:20:15 CEST 2013

> I think the sum way is the best.

On my Linux machine running R-3.0.0 the sum way is slightly faster:
  > x <- rexp(1e6, 2)
  > system.time(for(i in 1:100)sum(x>.3 & x<.5))
     user  system elapsed
    4.664   0.340   5.018
  > system.time(for(i in 1:100)length(which(x>.3 & x<.5)))
     user  system elapsed
    5.017   0.160   5.186

If you are doing many of these counts on the same dataset you
can save time by using functions like cut(), table(), ecdf(), and
findInterval().  E.g.,
> system.time(r1 <- vapply(seq(0,1,by=1/128)[-1], function(i)sum(x>(i-1/128) & x<=i), FUN.VALUE=0L))
   user  system elapsed
  5.332   0.568   5.909
> system.time(r2 <- table(cut(x, seq(0,1,by=1/128))))
   user  system elapsed
  0.500   0.008   0.511
> all.equal(as.vector(r1), as.vector(r2))
[1] TRUE

You should do the timings yourself, as the relative speeds will depend
on the version or dialect of  the R interpreter and how it was compiled.
E.g., with the current development version of 'TIBCO Enterprise Runtime for R' (aka 'TERR')
on this same 8-core Linux box the sum way is considerably faster then
the length(which) way:
  > x <- rexp(1e6, 2)
  > system.time(for(i in 1:100)sum(x>.3 & x<.5))
     user  system elapsed
     1.87    0.03    0.48
  > system.time(for(i in 1:100)length(which(x>.3 & x<.5)))
     user  system elapsed
     3.21    0.04    0.83
  > system.time(r1 <- vapply(seq(0,1,by=1/128)[-1], function(i)sum(x>(i-1/128) & x<=i), FUN.VALUE=0L))
     user  system elapsed
     2.19    0.04    0.56
  > system.time(r2 <- table(cut(x, seq(0,1,by=1/128))))
     user  system elapsed
     0.27    0.01    0.13
  > all.equal(as.vector(r1), as.vector(r2))
  [1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of lcn
> Sent: Friday, April 26, 2013 12:09 PM
> To: Mikhail Umorin
> Cc: r-help at r-project.org
> Subject: Re: [R] speed of a vector operation question
> 
> I think the sum way is the best.
> 
> 
> On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin <mikeumo at gmail.com> wrote:
> 
> > Hello,
> >
> > I am dealing with numeric vectors 10^5 to 10^6 elements long. The values
> > are
> > sorted (with duplicates) in the vector (v). I am obtaining the length of
> > vectors such as (v < c) or (v > c1 & v < c2), where c, c1, c2 are some
> > scalar
> > variables. What is the most efficient way to do this?
> >
> > I am using sum(v < c) since TRUE's are 1's and FALSE's are 0's. This seems
> > to
> > me more efficient than length(which(v < c)), but, please, correct me if I'm
> > wrong. So, is there anything faster than what I already use?
> >
> > I'm running R 2.14.2 on Linux kernel 3.4.34.
> >
> > I appreciate your time,
> >
> > Mikhail
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.