[Rd] HOW TO AVOID LOOPS

Tue Apr 15 01:41:38 CEST 2008

On Mon, 14 Apr 2008, Stephen Milborrow wrote:

> > Le sam. 12 avr. à 12:47, carlos martinez a écrit :
> > Looking for a simple, effective a minimum execution time solution.
> >
> > For a vector as:
> >
> > c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)
> >
> > To transform it to the following vector without using any loops:
> >
> > (0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6)
>
> Here is a fast solution using the Ra just-in-time compiler
> www.milbo.users.sonic.net/ra.
>
> jit(1)
> if (length(x) > 1)
>     for (i in 2:length(x))
>         if (x[i])
>             x[i] <- x[i-1] + 1
>
> The times in seconds for various solutions mailed to r-devel are listed
> below. There is some variation between runs and with the contents of x. The
> times shown are for
>
> set.seed(1066);  x <- as.double(runif(1e6) > .5)
>
> This was tested on a WinXP 3 GHz Pentium D with Ra 1.0.7 (based on R 2.6.2).
> The code to generate these results is attached.
>
> vin     24
> greg   11
> had    3.9
> dan    1.4
> dan2  1.4
> jit       0.25    # code is shown above, 7 secs with standard R 2.6.2>

Stephen's solution is certainly easy to read and write.

Another solution, if I understand the scope of the problem, is
   f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}

I made a script to run all the functions I noticed
(except for the library(inline) one) on various 0-1
vectors of length one million and got the following
timings on R 2.6.2 running on my Windows laptop
(Lenovo T61, Core 2 Duo at 2 GHz).   "error thrown"
means the function call died and "incorrect" means it
returned the wrong answer.

   > source("z:/dumpdata.R")
   Timing stopped at: 0.05 0 0.05 NA NA
   Error in dots[[1L]][[1L]] : subscript out of bounds
   Timing stopped at: 0.14 0.06 0.21 NA NA
   Error in x[x == 1] <- unlist(lapply(ends - starts, function(n) 1:n)) :
     incompatible types (from NULL to double) in subassignment type fix
      all.ones     all.zeros    few.long  many.short
   f1    7.02         7.03         7.04      7.03
   f2    0.13         0.13         0.13      2.52
   f3 error thrown   35.47      incorrect incorrect
   f4    0.19      error thrown    0.21      1.20
   f5    0.28         0.09         0.29      0.18
   f6    5.40         0.78         5.42      3.14
   f7    0.06         0.05         0.06      0.06

I've attached the script so you can figure out whose
function is whose if you care to.  The lapply/mapply
solution, f3, required that there be 1's at both ends of the
input vector.  Perhaps I miscopied the code.

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."
----------------------------------------------------------------------------

The test script:

len <- 1e6 # length of vectors in tests

funs <- list(
`f1` = function(x)Reduce( function(x,y) x*y + y, x, accumulate=TRUE ),

`f2` = function(x)x * unlist(lapply(rle(x)$lengths, seq_len)),

`f3` = function(x){
    ind <- which(x == 0)
    unlist(lapply(mapply(seq, ind, c(tail(ind, -1) - 1, length(x))),
        function(y) cumsum(x[y]))) },

`f4` = function(x) {
    d <- diff(c(0,x,0))
    starts <- which(d == 1)
    ends <- which(d == -1)
    x[x == 1] <- unlist(lapply(ends - starts, function(n) 1:n))
    x },

`f5` = function(x) {
    if (existsFunction("jit")) jit(1) else stop("no jit available")
    if (length(x) > 1)
        for (i in 2:length(x))
            if (x[i])
                x[i] <- x[i-1] + 1
    x
    },
`f6` = # same as f5, but not compiled with jit.
    function(x) {
        if (existsFunction("jit")) jit(0)
        if (length(x) > 1)
            for (i in 2:length(x))
                if (x[i])
                    x[i] <- x[i-1] + 1
        x
    },
`f7` =
    function(x) {
        tmp<-cumsum(x)
        tmp-cummax((!x)*tmp)
    }
)

data <- list(
    all.ones = rep(1, len),
    all.zeros = rep(0, len),
    few.long = rep( c(rep(1,len/10-1),0), len=len), # 10 long runs
    many.short = rep(c(1,0), len=len) # len/2 runs of length 1
)
expected <- list(
    all.ones = 1:len,
    all.zeros = rep(0, len),
    few.long = rep( c(1:(len/10-1), 0), len=len),
    many.short = rep(c(1,0), len=len)
)

print(noquote(sapply(names(data),
   function(d) sapply(names(funs),
      function(f){
          time<-try(unix.time(gcFirst=TRUE, val<-funs[[f]](data[[d]])))
          if (is(time, "try-error"))
              "error thrown"
          else if (!isTRUE(report <- all.equal(val, expected[[d]])))
              "incorrect"
          else
              sprintf("%7.2f", unname(time[1]+time[2]))
      }
   )
)))