[R] performance of do.call("rbind")
Sarah Goslee
sarah.goslee at gmail.com
Mon Jun 27 18:51:59 CEST 2016
There is a substantial overhead in rbind.dataframe() because of the
need to check the column types. Converting to matrix makes a huge
difference in speed, but be careful of type coercion.
testdf <- data.frame(matrix(runif(300), nrow=100, ncol=3))
testdf.list <- lapply(1:10000, function(x)testdf)
system.time(r.df <- do.call("rbind", testdf.list))
system.time({
testm.list <- lapply(testdf.list, as.matrix)
r.m <- do.call("rbind", testm.list)
})
> testdf <- data.frame(matrix(runif(300), nrow=100, ncol=3))
> testdf.list <- lapply(1:10000, function(x)testdf)
>
> system.time(r.df <- do.call("rbind", testdf.list))
user system elapsed
195.105 36.419 231.930
>
> system.time({
+ testm.list <- lapply(testdf.list, as.matrix)
+ r.m <- do.call("rbind", testm.list)
+ })
user system elapsed
0.603 0.009 0.612
Sarah
On Mon, Jun 27, 2016 at 11:51 AM, Witold E Wolski <wewolski at gmail.com> wrote:
> I have a list (variable name data.list) with approx 200k data.frames
> with dim(data.frame) approx 100x3.
>
> a call
>
> data <-do.call("rbind", data.list)
>
> does not complete - run time is prohibitive (I killed the rsession
> after 5 minutes).
>
> I would think that merging data.frame's is a common operation. Is
> there a better function (more performant) that I could use?
>
> Thank you.
> Witold
>
>
>
More information about the R-help
mailing list