[R] Simple Lookup... why so slow
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Fri Aug 6 16:45:03 CEST 2004
The first 2 solutions are vastly slower than the last 3 simply because
they use the for() loop. The vectorised versions are definitely faster.
# Solution 1 : list extraction operator
aa <- rep(NA, n); bb <- rep(NA, n)
system.time( for (i in 1:n) {
aa[i] <- PatDay$Day[i] - StartDay[PatDay$Treat[i], PatDay$Pat[i]] } )
[1] 0.33 0.00 0.33 0.00 0.00
# Solution 2 : numeric index with for loop
system.time( for (i in 1:n){
bb[i] <- PatDay[i,1]-StartDay[PatDay[i,3],PatDay[i,2]] } )
[1] 15.43 0.12 17.76 0.00 0.00
# Solution 3 : Vectorised operation with numeric index
system.time( cc <- PatDay[ , 1] - StartDay[ as.matrix(PatDay[, 3:2]) ] )
[1] 0.01 0.00 0.01 0.00 0.00
# Solution 4 : Vectorised operation with named index
> system.time( dd <- PatDay[ , "Day"] - StartDay[ as.matrix(PatDay[,
c("Treat", "Pat")]) ] )
[1] 0.01 0.00 0.01 0.00 0.00
# Solution 5 : Vectorised operation with list extractor
system.time( ee <- PatDay$Day - StartDay[ cbind(PatDay$Treat,PatDay$Pat)
] )
[1] 0 0 0 0 0
There is insufficient precision to say which of the parameterised
operation is faster. So I tried the same thing with n=400,000 and the
last 3 gave the following timing
Solution 3 : [1] 1.67 0.21 1.89 0.00 0.00
Solution 4 : [1] 2.55 0.21 2.77 0.00 0.00
Solution 5 : [1] 0.25 0.03 0.28 0.00 0.00
However, when I redefined PatDay as matrix, for n=400,000
Solution 3 : [1] 0.48 0.04 0.51 0.00 0.00
Solution 4 : [1] 0.26 0.04 0.31 0.00 0.00
Just to make sure all the answer are the same, try this
cor( cbind(aa, bb, cc, dd) )
aa bb cc dd
aa 1 1 1 1
bb 1 1 1 1
cc 1 1 1 1
dd 1 1 1 1
or the slow way : all.equal(aa, bb); all.equal(aa, cc); ...
Regards, Adai
On Fri, 2004-08-06 at 13:42, Dieter Menne wrote:
> Dear List,
>
> At 32 degrees Celsius in the office, I was too lazy to figure out
> the correct xapplytion for a simple lookup problem
> and regressed to well-known c-style. Only to see my
> computer hang forever doing 10000 indexed offset calculation.
> Boiled down, the problem is shown below; needs a few milliseconds
> in c. Looking at the timing results of n=2000 and n=4000,
> this is not linear in time, so something I don't understand
> must go on.
>
> And, just as an aside: why is $-indexing so much faster (!)
> than numeric indexing?
>
> Dieter
>
> (all on Windows, latest R-Version)
> ----
>
> # Generate Data set
> StartDay = matrix(as.integer(runif(80)*20),nrow=4)
> n=4000
> PatDay = data.frame(Day = as.integer(runif(n)*20)+50,
> Pat= as.integer(runif(n)*20)+1,
> Treat = as.integer(runif(n)*4)+1,
> DayOff=NA) # reserve output space
> # Correct for days offset
> ti= system.time(
> for (i in 1:n)
> PatDay$DayOff[i] = PatDay$Day[i]-StartDay[PatDay$Treat[i],PatDay$Pat[i]]
> )
> cat("$Style index",n,ti[3],"\n");
> # n= 2000 3 seconds
> # n= 4000 15 seconds
>
> # I first believed using numeric indexes could be faster...
> ti= system.time(
> for (i in 1:n)
> PatDay[i,4] = PatDay[i,1]-StartDay[PatDay[i,3],PatDay[i,2]]
> )
> cat("Numeric index", n,ti[3],"\n");
> # n=2000 12 seconds
> # n=4000 53 seconds
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list