[R] Fast way of finding top-n values of a long vector
Allan Engelhardt
allane at cybaea.com
Thu Jun 4 10:18:19 CEST 2009
If x is a (long) vector and n << length(x), what is a fast way of
finding the top-n values of x?
Some suggestions (calculating the ratio of the two top values):
library("rbenchmark")
set.seed(1); x <- runif(1e6, max=1e7); x[1] <- NA;
benchmark(
replications=20,
columns=c("test","elapsed"),
order="elapsed"
, sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
, max = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; m/max(x[-w],
na.rm=TRUE);}
, max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
)
# test elapsed
# 3 max2 0.772
# 2 max 1.732
# 1 sort 4.958
I want to apply this code to a few tens of thousands of vectors so speed
does matter. In C or similar I would of course calculate the result
with a single pass through x, and not with three passes as in 'max2'.
Allan.
PS: I know na.last=NA is the default for sort, but there is no harm in
being explicit in how you want NA's handled.
More information about the R-help
mailing list