[R] performance of do.call("rbind")

Hervé Pagès hpages at fredhutch.org
Mon Jun 27 21:58:42 CEST 2016


Hi,

Note that if your list of 200k data frames is the result of splitting
a big data frame, then trying to rbind the result of the split is
equivalent to reordering the orginal big data frame. More precisely,

   do.call(rbind, unname(split(df, f)))

is equivalent to

   df[order(f), , drop=FALSE]

(except for the rownames), but the latter is *much* faster!

Cheers,
H.


On 06/27/2016 08:51 AM, Witold E Wolski wrote:
> I have a list (variable name data.list) with approx 200k data.frames
> with dim(data.frame) approx 100x3.
>
> a call
>
> data <-do.call("rbind", data.list)
>
> does not complete - run time is prohibitive (I killed the rsession
> after 5 minutes).
>
> I would think that merging data.frame's is a common operation. Is
> there a better function (more performant) that I could use?
>
> Thank you.
> Witold
>
>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-help mailing list