[R] Find the 50 highest values in a matrix
Henrik Bengtsson
hb at stat.berkeley.edu
Fri Jun 18 13:39:36 CEST 2010
You might also want to consider _partial sorting_ by using the
'partial' argument of sort(), especially when the number of data
points is really large.
Since argument 'decreasing=FALSE' is not supported when using
'partial', you have to flip it yourself by negating the values, e.g.
x <- rnorm(8e6);
is.na(x) <- sample(length(x), size=1e6);
n <- 50;
t1 <- system.time({
x1 <- sort(x, decreasing=TRUE);
x1h <- x1[1:n];
});
t2 <- system.time({
x2 <- sort(-x, partial=n);
x2h <- -sort(x2[1:n]);
});
stopifnot(identical(x2h, x1h));
print(t2/t1);
user system elapsed
0.3076923 0.7777778 0.3491525
/Henrik
On Fri, Jun 18, 2010 at 1:20 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
>
> m <- matrix(round(rnorm(4000 * 2000), 4), nr = 4000)
> is.na(m) <- sample(8e6, 1e6)
>
> system.time(
> idx <- which(
> matrix(m %in% head(sort(m, TRUE), 50),
> nr = nrow(m)), arr.ind = TRUE))
>
> # user system elapsed
> # 3.12 0.19 3.18
>
> -Peter Ehlers
>
>
> On 2010-06-18 5:13, Dennis Murphy wrote:
>>
>> Hi:
>>
>> Here's a faked up example:
>>
>> a<- matrix(rnorm(4000*2000), 4000, 2000)
>> # Generate some NAs in the matrix
>> nr<- sample(50, 1:4000)
>> nc<- sample(50, 1:2000)
>> a[nr, nc]<- NA
>>
>> # convert to data frame:
>> b<- data.frame(row = rep(1:4000, 2000), col = rep(1:2000, each = 4000),
>> x = as.vector(a))
>> # relatively time consuming...about 13.5 s on my machine
>> bb<- b[rev(order(b$x, na.last = FALSE)), ]
>>>
>>> bb[1:10, ]
>>
>> row col x
>> 691269 3269 173 5.103704
>> 7815076 3076 1954 4.961544
>> 4999621 3621 1250 4.953265
>> 500469 469 126 4.937655
>> 5878224 2224 1470 4.929150
>> 4287270 3270 1072 4.913791
>> 4442521 2521 1111 4.896869
>> 4668867 867 1168 4.863504
>> 5716575 575 1430 4.760778
>> 3055274 3274 764 4.758995
>>
>> HTH,
>> Dennis
>>
>>
>> On Thu, Jun 17, 2010 at 10:41 PM,
>> uschlecht<ulrich.schlecht at stanford.edu>wrote:
>>
>>>
>>> Hi,
>>>
>>> I have a huge matrix (4000 * 2000 data points) and I would like to
>>> retrieve
>>> the coordinates (column and row) for the top 50 (or x) values. Some
>>> positions in the matrix have NA as a value. These should be discarded.
>>>
>>> My current method is to replace all NAs by 0, then rank all the values
>>> and
>>> then extract the positions with the 50 highest ranks. It is very
>>> time-consuming!
>>>
>>> Is there a simpler way to do this?
>>>
>>> Thank you,
>>> Ulrich
>>>
>>> --
>>> View this message in context:
>>>
>>> http://r.789695.n4.nabble.com/Find-the-50-highest-values-in-a-matrix-tp2259721p2259721.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list