[Rd] Quicker quantiles?
roger koenker
roger at ysidro.econ.uiuc.edu
Sat Mar 11 20:28:38 CET 2006
Motivated by Deepayan's recent inquiries about the efficiency of the
R 'quantile'
function:
http://tolstoy.newcastle.edu.au/R/devel/05/11/3305.html
http://tolstoy.newcastle.edu.au/R/devel/06/03/4358.html
I decided to try to revive an old project to implement a version of
the Floyd
and Rivest (1975) algorithm for finding quantiles with O(n)
comparisons. I
used to have a direct translation from FR's algol68 in my quantreg
package,
but was forced to remove it when it encountered g77 compilers that
didn't like the way I had handled recursion.
Fortunately, in the interim, I've corresponded with Krzysztof Kiwiel
in Warsaw
who has made a detailed study of the algorithm:
K.C. Kiwiel: On Floyd and Rivest's SELECT Algorithm, Theoretical
Computer Sci. 347 (2005) 214-238.
He has also kindly agreed to allow me to incorporate his
implementation of
the FR algorithm in R under GPL.
For the moment, I have made an alpha-test package with a single
function --
'kuantile' -- that attempts to reproduce the functionality of the
current
base quantile function, essentially replacing the partial sorting done
there with calls to Kiwiel's 'select' function. This package is
available
from:
http://www.econ.uiuc.edu/~roger/research/rq/kuantile
The good news is that the new function _is_ somewhat quicker than
the base 'quantile' function. The following table is based on means
of 10 replications. The first two columns report times for computing
just the median, the next two columns for computing the five default
quantiles: seq(0,1,.25), and the last column is the time needed for
a call
of sort().
mean system.time()[1] for various calls*
median only default 5
n quantile kuantile quantile kuantile sort
100 0.002 0.001 0.004 0.004 0.000
1000 0.002 0.001 0.003 0.002 0.002
10000 0.003 0.002 0.009 0.005 0.005
1e+05 0.024 0.010 0.061 0.013 0.031
1e+06 0.206 0.120 0.535 0.142 0.502
1e+07 2.132 0.790 5.575 1.035 7.484
* On a mac G5 running R 2.2.1.
The bad news is that this approach doesn't help with Deepayan's
original question which involved instances in which quantile() was
being called for rather large number of probabilities. In such
cases it seems that it is difficult to improve upon doing a full sort.
I would welcome comments on any of this....
url: www.econ.uiuc.edu/~roger Roger Koenker
email: rkoenker at uiuc.edu Department of Economics
vox: 217-333-4558 University of Illinois
fax: 217-244-6678 Champaign, IL 61820
More information about the R-devel
mailing list