[Rd] Speed improvement for Find() and Position()
Olaf Mersmann
olafm at statistik.tu-dortmund.de
Wed Sep 1 14:06:37 CEST 2010
Dear R-developers,
both Find() and Position() (as the documentation mentions) are currently not optimized in any way. I have rewritten both functions in a more efficient manner by replacing the sapply() with a for() loop that terminates early if a match is found. Here is a patch against the current subversion HEAD
http://www.statistik.tu-dortmund.de/~olafm/temp/fp.patch
and here are some numbers to show that this change is worth while:
% cat fp_bench.R
set.seed(42)
pred <- function(z) z == 1
for (n in c(10^(2:4))) {
x <- sample(1:n, 2*n, replace=TRUE)
tf <- system.time(replicate(1000L, Find(pred, x)))
message(sprintf("Find : n=%5i user=%6.3f system=%6.3f",
2*n, tf[1], tf[2]))
tp <- system.time(replicate(1000L, Find(pred, x)))
message(sprintf("Position: n=%5i user=%6.3f system=%6.3f",
2*n, tp[1], tp[2]))
}
## Unpatched R:
% Rscript fp_bench.R
Find : n= 200 user= 0.491 system= 0.015
Position: n= 200 user= 0.477 system= 0.014
Find : n= 2000 user= 4.450 system= 0.083
Position: n= 2000 user= 4.507 system= 0.094
Find : n=20000 user=63.435 system= 1.497
Position: n=20000 user=63.130 system= 1.328
## Patched R:
% ./bin/Rscript fp_bench.R
Find : n= 200 user= 0.101 system= 0.013
Position: n= 200 user= 0.085 system= 0.003
Find : n= 2000 user= 0.781 system= 0.002
Position: n= 2000 user= 0.809 system= 0.012
Find : n=20000 user=20.537 system= 0.394
Position: n=20000 user=20.502 system= 0.404
Cheers,
Olaf Mersmann
More information about the R-devel
mailing list