[R] Benchmarking R, why sort() is so slow?

Prof Brian D Ripley ripley at stats.ox.ac.uk
Sat Apr 28 10:27:00 CEST 2001

On Fri, 27 Apr 2001, Philippe Grosjean wrote:

> I suspect it should be a question of algorithm choice. May be sort() is
> important enough, including for large datasets, to place some improvement in
> the "to do" list for a further version...?

I find it hard to believe that doing anything useful with a 1.1 million
length vector will not take very much longer than complete sorting, which
is a rare problem in statistics (not needed for quantiles, for example).
That's the problem with these benchmarks, they don't test realistic
problems because many of the packages concerned would need days of coding
to run one.

FWIW, there are O(n log n) sort algorithms, worst case, that are fairly
competitive on average too.  But testing random vectors is not
representation of real problems, and for many such (and even for
rnorm(1e6)) one can do even better by e.g radix sorting.  So if you know
something about the input you make your package look better. Again,
a problem with benchmarking.

>> Peter Dalgaard

> The internal algorithm is a shellsort, which is supposedly of
> complexity O(n^1.25) and has decent worst-case behaviour. Other
> algorithms like quicksort have typical performance of O(n log n) but
> extreme cases of O(n^2).
> For large vectors the O(n^.25/log(n)) relative complexity is going to
> make a difference, although the observed differences seem larger.

There are several shellsort variants.  I would need to look up the
known results for precisely this one.

> There are very likely better choices of sort algorithm...

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list