[R] Ordering long vectors

Göran Broström gb at stat.umu.se
Sat Jun 7 18:29:20 CEST 2003


I need to order a long vector of integers with rather few unique values.
This is very slow:

> x <- sample(rep(c(1:10), 50000))
> system.time(ord <- order(x))
[1] 189.18   0.09 190.48   0.00   0.00

But with no ties

> y <- sample(500000)
> system.time(ord1 <- order(y))
[1] 1.18 0.00 1.18 0.00 0.00

it is very fast!
This gave me the following idea: Since I don't care about keeping the 
order within tied values, why not add some small disturbance to  x,
and indeed,

> unix.time(ord2 <- order(x + runif(length(x), -0.1, 0.1)))
[1] 1.66 0.00 1.66 0.00 0.00

> identical(x[ord], x[ord2])
[1] TRUE

it works! 

Is there an obvious (=better) solution to this problem that I have 
overlooked? In any case, I think that the problem with order and many 
ties is worth mentioning in the help page. 

For the record: R-1.7.0, RH9

Göran
---
 Göran Broström                    tel: +46 90 786 5223
 Department of Statistics          fax: +46 90 786 6614
 Umeå University                   http://www.stat.umu.se/egna/gb/
 SE-90187 Umeå, Sweden             e-mail: gb at stat.umu.se




More information about the R-help mailing list