[R] Splitting data.frame into a list of small data.frames given indices

Witold E Wolski wewolski at gmail.com
Wed Jun 29 11:16:56 CEST 2016

It's the inverse problem to merging a list of data.frames into a large
data.frame just discussed in the "performance of do.call("rbind")"

I would like to split a data.frame into a list of data.frames
according to first column.
This SEEMS to be easily possible with the function base::by. However,
as soon as the data.frame has a few million rows this function CAN NOT
BE USED (except you have A PLENTY OF TIME).

for 'by' runtime ~ nrow^2, or formally O(n^2)  (see benchmark below).

So basically I am looking for a similar function with better complexity.

 > nrows <- c(1e5,1e6,2e6,3e6,5e6)
> timing <- list()
> for(i in nrows){
+ dum <- peaks[1:i,]
+ timing[[length(timing)+1]] <- system.time(x<- by(dum[,2:3],
INDICES=list(dum[,1]), FUN=function(x){x}, simplify = FALSE))
+ }
> names(timing)<- nrows
> timing
   user  system elapsed
   0.05    0.00    0.05

   user  system elapsed
   1.48    2.98    4.46

   user  system elapsed
   7.25   11.39   18.65

   user  system elapsed
  16.15   25.81   41.99

   user  system elapsed
  43.22   74.72  118.09

Witold Eryk Wolski

More information about the R-help mailing list