[Rd] HOW TO AVOID LOOPS
Bill Dunlap
bill at insightful.com
Tue Apr 15 01:41:38 CEST 2008
On Mon, 14 Apr 2008, Stephen Milborrow wrote:
> > Le sam. 12 avr. à 12:47, carlos martinez a écrit :
> > Looking for a simple, effective a minimum execution time solution.
> >
> > For a vector as:
> >
> > c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)
> >
> > To transform it to the following vector without using any loops:
> >
> > (0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6)
>
> Here is a fast solution using the Ra just-in-time compiler
> www.milbo.users.sonic.net/ra.
>
> jit(1)
> if (length(x) > 1)
> for (i in 2:length(x))
> if (x[i])
> x[i] <- x[i-1] + 1
>
> The times in seconds for various solutions mailed to r-devel are listed
> below. There is some variation between runs and with the contents of x. The
> times shown are for
>
> set.seed(1066); x <- as.double(runif(1e6) > .5)
>
> This was tested on a WinXP 3 GHz Pentium D with Ra 1.0.7 (based on R 2.6.2).
> The code to generate these results is attached.
>
> vin 24
> greg 11
> had 3.9
> dan 1.4
> dan2 1.4
> jit 0.25 # code is shown above, 7 secs with standard R 2.6.2>
Stephen's solution is certainly easy to read and write.
Another solution, if I understand the scope of the problem, is
f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}
I made a script to run all the functions I noticed
(except for the library(inline) one) on various 0-1
vectors of length one million and got the following
timings on R 2.6.2 running on my Windows laptop
(Lenovo T61, Core 2 Duo at 2 GHz). "error thrown"
means the function call died and "incorrect" means it
returned the wrong answer.
> source("z:/dumpdata.R")
Timing stopped at: 0.05 0 0.05 NA NA
Error in dots[[1L]][[1L]] : subscript out of bounds
Timing stopped at: 0.14 0.06 0.21 NA NA
Error in x[x == 1] <- unlist(lapply(ends - starts, function(n) 1:n)) :
incompatible types (from NULL to double) in subassignment type fix
all.ones all.zeros few.long many.short
f1 7.02 7.03 7.04 7.03
f2 0.13 0.13 0.13 2.52
f3 error thrown 35.47 incorrect incorrect
f4 0.19 error thrown 0.21 1.20
f5 0.28 0.09 0.29 0.18
f6 5.40 0.78 5.42 3.14
f7 0.06 0.05 0.06 0.06
I've attached the script so you can figure out whose
function is whose if you care to. The lapply/mapply
solution, f3, required that there be 1's at both ends of the
input vector. Perhaps I miscopied the code.
----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com
"All statements in this message represent the opinions of the author and do
not necessarily reflect Insightful Corporation policy or position."
----------------------------------------------------------------------------
The test script:
len <- 1e6 # length of vectors in tests
funs <- list(
`f1` = function(x)Reduce( function(x,y) x*y + y, x, accumulate=TRUE ),
`f2` = function(x)x * unlist(lapply(rle(x)$lengths, seq_len)),
`f3` = function(x){
ind <- which(x == 0)
unlist(lapply(mapply(seq, ind, c(tail(ind, -1) - 1, length(x))),
function(y) cumsum(x[y]))) },
`f4` = function(x) {
d <- diff(c(0,x,0))
starts <- which(d == 1)
ends <- which(d == -1)
x[x == 1] <- unlist(lapply(ends - starts, function(n) 1:n))
x },
`f5` = function(x) {
if (existsFunction("jit")) jit(1) else stop("no jit available")
if (length(x) > 1)
for (i in 2:length(x))
if (x[i])
x[i] <- x[i-1] + 1
x
},
`f6` = # same as f5, but not compiled with jit.
function(x) {
if (existsFunction("jit")) jit(0)
if (length(x) > 1)
for (i in 2:length(x))
if (x[i])
x[i] <- x[i-1] + 1
x
},
`f7` =
function(x) {
tmp<-cumsum(x)
tmp-cummax((!x)*tmp)
}
)
data <- list(
all.ones = rep(1, len),
all.zeros = rep(0, len),
few.long = rep( c(rep(1,len/10-1),0), len=len), # 10 long runs
many.short = rep(c(1,0), len=len) # len/2 runs of length 1
)
expected <- list(
all.ones = 1:len,
all.zeros = rep(0, len),
few.long = rep( c(1:(len/10-1), 0), len=len),
many.short = rep(c(1,0), len=len)
)
print(noquote(sapply(names(data),
function(d) sapply(names(funs),
function(f){
time<-try(unix.time(gcFirst=TRUE, val<-funs[[f]](data[[d]])))
if (is(time, "try-error"))
"error thrown"
else if (!isTRUE(report <- all.equal(val, expected[[d]])))
"incorrect"
else
sprintf("%7.2f", unname(time[1]+time[2]))
}
)
)))
More information about the R-devel
mailing list