[R] Largest N Values Efficiently?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Nov 12 07:56:12 CET 2007
What is 'x' here? What type? Does it contain NAs? Are there ties? R's
ordering functions are rather general, and you can gain efficiency by
ruling some of these out.
See ?sort, look at the 'partial' argument, including the comments in the
Details. And also look at ?sort.list.
sort.int(x) is more efficient than x[order(x)], and x[order(x)[1:n]] is
more efficient than x[order(x)][1:n] for most types.
Finally, does efficiency matter? As the examples in ?sort show, R can
sort a vector of length 2000 is well under 1ms, and 1e7 random normals in
less time than they take to generate. There are not many tasks where
gaining efficiency over x[order(x)][1:n] will be important. E.g.
> system.time(x <- rnorm(1e6))
user system elapsed
0.44 0.00 0.44
> system.time(x[order(x)][1:4])
user system elapsed
1.72 0.00 1.72
> system.time(x2 <- sort.int(x, method = "quick")[1:4])
user system elapsed
0.31 0.00 0.32
> system.time(min(x))
user system elapsed
0.02 0.00 0.02
> system.time(x2 <- sort.int(x, partial=1)[1])
user system elapsed
0.07 0.00 0.07
and do savings of tenths of a second matter? (There is also
quantreg::kselect, if you work out how to use it, which apparently is
a bit faster at partial sorting on MacOS X but not elsewhere.)
On Sun, 11 Nov 2007, David Katz wrote:
>
> What is the most efficient alternative to x[order(x)][1:n] where
> length(x)>>n?
That is the smallest n values, pace your subject line.
> I also need the positions of the mins/maxs perhaps by preserving names.
>
> Thanks for any suggestions.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list