[R] Largest N Values Efficiently?
David Katz
david at davidkatzconsulting.com
Mon Nov 12 17:36:36 CET 2007
x is a 1XN sparse matrix of numerics. I am using the Matrix package to
represent as a sparse matrix; the representation has a numeric vector
representing the positions within the matrix. My goal is find the columns
with the n largest values, here positive correlations. Part of my strategy
is to only sort the nonzeros which are available as a numeric vector.
Thanks for your interest and input.
Prof Brian Ripley wrote:
>
> What is 'x' here? What type? Does it contain NAs? Are there ties? R's
> ordering functions are rather general, and you can gain efficiency by
> ruling some of these out.
>
> See ?sort, look at the 'partial' argument, including the comments in the
> Details. And also look at ?sort.list.
>
> sort.int(x) is more efficient than x[order(x)], and x[order(x)[1:n]] is
> more efficient than x[order(x)][1:n] for most types.
>
> Finally, does efficiency matter? As the examples in ?sort show, R can
> sort a vector of length 2000 is well under 1ms, and 1e7 random normals in
> less time than they take to generate. There are not many tasks where
> gaining efficiency over x[order(x)][1:n] will be important. E.g.
>
>> system.time(x <- rnorm(1e6))
> user system elapsed
> 0.44 0.00 0.44
>> system.time(x[order(x)][1:4])
> user system elapsed
> 1.72 0.00 1.72
>> system.time(x2 <- sort.int(x, method = "quick")[1:4])
> user system elapsed
> 0.31 0.00 0.32
>> system.time(min(x))
> user system elapsed
> 0.02 0.00 0.02
>> system.time(x2 <- sort.int(x, partial=1)[1])
> user system elapsed
> 0.07 0.00 0.07
>
> and do savings of tenths of a second matter? (There is also
> quantreg::kselect, if you work out how to use it, which apparently is
> a bit faster at partial sorting on MacOS X but not elsewhere.)
>
>
> On Sun, 11 Nov 2007, David Katz wrote:
>
>>
>> What is the most efficient alternative to x[order(x)][1:n] where
>> length(x)>>n?
>
> That is the smallest n values, pace your subject line.
>
>> I also need the positions of the mins/maxs perhaps by preserving names.
>>
>> Thanks for any suggestions.
>>
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.html#a13708965
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list