[R] bigest part of vector
Stavros Macrakis
macrakis at alum.mit.edu
Tue Feb 24 23:12:24 CET 2009
On Tue, Feb 24, 2009 at 3:01 PM, Bert Gunter <gunter.berton at gene.com> wrote:
> Nothing wrong with prior suggestions, but strictly speaking, (fully) sorting
> the vector is unnecessary.
>
> y[y > quantile(y, 1- p/length(y))]
>
> will do it without the (complete) sort. (But sorting is so efficient anyway,
> I don't think you could notice any difference).
R uses an efficient quantile calculation, so it is significantly
faster for large data sets:
> big <- rnorm(1e7)
> system.time(res<-big[big>=quantile(big,(length(big)-1)/length(big))])
user system elapsed
0.56 0.14 0.70
> system.time(res<-big[big>=quantile(big,(length(big)-100)/length(big))])
user system elapsed
0.75 0.10 0.84
> system.time(res<-big[big>=quantile(big,(length(big)-10000)/length(big))])
user system elapsed
0.61 0.08 0.68
> system.time(res<-big[big>=quantile(big,1/2)])
user system elapsed
1.08 0.08 1.17
> system.time(res<-sort(big))
user system elapsed
4.67 0.03 4.72
> system.time(res<-sort(big)[round(length(big)/2):length(big)])
user system elapsed
4.71 0.10 4.82
Surprisingly, perhaps, "order" is much slower than "sort":
> big <- rnorm(1e7)
> system.time(res<-order(big))
user system elapsed
21.07 0.05 21.14
And you do need to be careful about your handling of ties:
> test <- c(1,2,3,4,4,4)
> test[test>=quantile(test,5/6)]
[1] 4 4 4
> test[test>=quantile(test,6/6)]
[1] 4 4 4
Hope this helps.
-s
More information about the R-help
mailing list