[R] Using plyr::dply more (memory) efficiently?

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Apr 29 18:05:26 CEST 2010


Hi Matthew,

> Sounds like its working,  but could you give us an idea whether it is quick
> and memory efficient ?

I actually can't believe what I'm seeing, I just recoded the function
to use data.table.

What has taken something on the order of ~ 20-30mins with an
lapply/do.call(rbind, ...) combo (actually I was using sqldf to do
quicker subselects) just finished in < 1 min.

The memory being used in my R workspace now is still under 2GB, where
previously it was ~ 8GB when do.call(rbind, ...)-ing my list into a
data.frame, and +20GB with ddply.

I'm going to double check that I have the same results, but for now
I'm completely blown away.

data.table is awesome, thanks for this package.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list