[R] Sorting and subsetting

Tue Sep 21 12:09:42 CEST 2010

All the solutions in this thread so far use the lapply(split(...)) paradigm
either directly or indirectly. That paradigm doesn't scale. That's the
likely
source of quite a few 'out of memory' errors and performance issues in R.

data.table doesn't do that internally, and it's syntax is pretty easy.

> tmp <- data.table(index = gl(2,20), foo = rnorm(40))

> tmp[, .SD[head(order(-foo),5)], by=index]
      index index.1       foo
 [1,]     1       1 1.9677303
 [2,]     1       1 1.2731872
 [3,]     1       1 1.1100931
 [4,]     1       1 0.8194719
 [5,]     1       1 0.6674880
 [6,]     2       2 1.2236383
 [7,]     2       2 0.9606766
 [8,]     2       2 0.8654497
 [9,]     2       2 0.5404112
[10,]     2       2 0.3373457
> 

As you can see it currently repeats the group column which is a
shame (on the to do list to fix).

Matthew

http://datatable.r-forge.r-project.org/

-- 
View this message in context: http://r.789695.n4.nabble.com/Sorting-and-subsetting-tp2547360p2548319.html
Sent from the R help mailing list archive at Nabble.com.