[R] performance of do.call("rbind")

Sarah Goslee sarah.goslee at gmail.com
Mon Jun 27 18:51:59 CEST 2016


There is a substantial overhead in rbind.dataframe() because of the
need to check the column types. Converting to matrix makes a huge
difference in speed, but be careful of type coercion.

testdf <- data.frame(matrix(runif(300), nrow=100, ncol=3))
testdf.list <- lapply(1:10000, function(x)testdf)

system.time(r.df <- do.call("rbind", testdf.list))

system.time({
testm.list <- lapply(testdf.list, as.matrix)
r.m <- do.call("rbind", testm.list)
})


> testdf <- data.frame(matrix(runif(300), nrow=100, ncol=3))
> testdf.list <- lapply(1:10000, function(x)testdf)
>
> system.time(r.df <- do.call("rbind", testdf.list))
   user  system elapsed
195.105  36.419 231.930
>
> system.time({
+ testm.list <- lapply(testdf.list, as.matrix)
+ r.m <- do.call("rbind", testm.list)
+ })
   user  system elapsed
  0.603   0.009   0.612

Sarah

On Mon, Jun 27, 2016 at 11:51 AM, Witold E Wolski <wewolski at gmail.com> wrote:
> I have a list (variable name data.list) with approx 200k data.frames
> with dim(data.frame) approx 100x3.
>
> a call
>
> data <-do.call("rbind", data.list)
>
> does not complete - run time is prohibitive (I killed the rsession
> after 5 minutes).
>
> I would think that merging data.frame's is a common operation. Is
> there a better function (more performant) that I could use?
>
> Thank you.
> Witold
>
>
>



More information about the R-help mailing list