[Rd] Bug/Wishlist: 'partial' in 'sort' and 'quantile' (PR#8650)
ripley@stats.ox.ac.uk
ripley at stats.ox.ac.uk
Sat Mar 4 18:36:51 CET 2006
Deepayan,
The current algorithm is really designed for partial of length 1, and is
more or less proportional to the length of partial. So inevitably it is
slow in your (pretty unrealistic?) example. I have temporarily altered it
so a barebones full sort is done if partial has more than 10 elements, the
changeover point for a million-element vector on my system.
John Chambers wrote a paper on this in 1971 (and I know that from his 1977
book). It is possible to write partial sorting to be about as fast as
sorting in all cases (at least if partial is sorted) and much faster if
partial is small. But I am not sure it is really worth the bother when
full sorting is so fast even on a million elements.
Brian
On Thu, 2 Mar 2006, deepayan.sarkar at gmail.com wrote:
> Hi,
>
> This is essentially a reposting of
>
> http://tolstoy.newcastle.edu.au/R/devel/05/11/3305.html
>
> which had no responses, and the behaviour reported there persists in
> r-devel as of yesterday.
>
> (1) sort() with non-null partial
>
>> x = rnorm(100000)
>> keep = as.integer(ppoints(10000) * 100000)
>> system.time(sort(x))
> [1] 0.05 0.00 0.04 0.00 0.00
>> system.time(sort(x, partial = keep))
> [1] 52.04 0.02 52.08 0.00 0.00
>
> This is perhaps not strictly a bug, but taking approximately 1000
> times longer to do a subset of the work seems pointless at best.
>
> (2) quantile.default() always calls sort() with a non-null partial
> argument. Consequently,
>
>> system.time(quantile(x, ppoints(10000)))
> [1] 88.82 0.05 88.90 0.00 0.00
>
> There's no way around this except by writing a custom version of
> quantile. lattice currently does this, giving
>
>> system.time(lattice:::fast.quantile(x, ppoints(10000)))
> [1] 0.07 0.01 0.08 0.00 0.00
>
> Which brings me to my wishlist request: if (1) cannot be fixed easily,
> could quantile.default() at least have an optional argument that can
> be used to disable partial sorting?
>
>> sessionInfo()
> Version 2.3.0 Under development (unstable) (2006-02-28 r37448)
> x86_64-unknown-linux-gnu
>
> attached base packages:
> [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets"
> [7] "base"
>
> Deepayan
> --
> http://www.stat.wisc.edu/~deepayan/
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list