[R] Fast way of finding top-n values of a long vector
Allan Engelhardt
allane at cybaea.com
Thu Jun 4 20:51:31 CEST 2009
Greg Snow wrote:
> Try adding a version that uses sort with the partial argument, that should be faster than regular sort (for long enough test vectors) and possibly faster than the max solutions when finding more than just the largest 2.
I find the documentation for the partial argument in sort very difficult
to understand, but it seems to be something like this:
library("rbenchmark")
set.seed(1); x <- runif(1e6, max=1e7); x[1] <- NA;
benchmark(
replications=20,
columns=c("test","elapsed"),
order="elapsed"
, sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
, qsrt = {a<-sort(x, decreasing=TRUE, na.last=NA, method="quick")[1:2];
a[1]/a[2];}
, part = {a<-sort.int(-x, partial=1:2, na.last=NA)[1:2]; a[1]/a[2];}
, max1 = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1];
m/max(x[-w],na.rm=TRUE);}
, max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
)
# test elapsed
# 5 max2 0.846
# 4 max1 1.957
# 3 part 2.752
# 2 qsrt 4.561
# 1 sort 5.577
Completely agree on your point about partial sort being faster for
larger values of n and certainly giving more scalable code which was
also what I was looking for so thanks for that tip!
> Also, for your max solutions, what happens when the 2 largest values are identical?
It returns those two values, just like the sort solution:
x <- c(999,NA,1:10,NA,999)
m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; c(m, max(x[-w],na.rm=TRUE))
# [1] 999 999
sort(x, decreasing=TRUE, na.last=NA)[1:2]
# [1] 999 999
Allan.
More information about the R-help
mailing list