[R] Splitting data.frame into a list of small data.frames given indices
Witold E Wolski
wewolski at gmail.com
Wed Jun 29 11:16:56 CEST 2016
It's the inverse problem to merging a list of data.frames into a large
data.frame just discussed in the "performance of do.call("rbind")"
thread
I would like to split a data.frame into a list of data.frames
according to first column.
This SEEMS to be easily possible with the function base::by. However,
as soon as the data.frame has a few million rows this function CAN NOT
BE USED (except you have A PLENTY OF TIME).
for 'by' runtime ~ nrow^2, or formally O(n^2) (see benchmark below).
So basically I am looking for a similar function with better complexity.
> nrows <- c(1e5,1e6,2e6,3e6,5e6)
> timing <- list()
> for(i in nrows){
+ dum <- peaks[1:i,]
+ timing[[length(timing)+1]] <- system.time(x<- by(dum[,2:3],
INDICES=list(dum[,1]), FUN=function(x){x}, simplify = FALSE))
+ }
> names(timing)<- nrows
> timing
$`1e+05`
user system elapsed
0.05 0.00 0.05
$`1e+06`
user system elapsed
1.48 2.98 4.46
$`2e+06`
user system elapsed
7.25 11.39 18.65
$`3e+06`
user system elapsed
16.15 25.81 41.99
$`5e+06`
user system elapsed
43.22 74.72 118.09
--
Witold Eryk Wolski
More information about the R-help
mailing list