[R-pkgs] New version of package ff

Jens Oehlschlägel jens.oehlschlaegel at truecluster.com
Fri Nov 6 10:36:17 CET 2009


Dear R community,

ff Version 2.1.1 is available on CRAN. It now supports large data.frames, 
csv import/export, packed atomic datatypes and bit filtering from package 
'bit' on which it depends from now.

Some performance results in seconds from test data with 78 mio rows and 7 columns on a 3 GB notebook:

sequential reading 1 mio rows: csv = 32.7  ffdf = 1.3
sequential writing 1 mio rows: csv = 35.5  ffdf = 1.5

Examples of things you can do with ff and bit:
- direct random access to rows of large data-frame instead of talking to SQL database (?ffdf)
- store 4-level factor like A,T,G,C with 2bit instead of 32bit (?vmode)
- fast chunked iteration (?chunk)
- run linear model on large dataset using biglm (?chunk.ffdf)
- handle boolean selections by factor 32 faster and less RAM consuming (?bit)
- handle very skewed selections very fast (?bitwhich)
- parallel access to large dataset just by sending ff's small metadata from master to slaves (e.g. with snowfall)

ff is hosted on r-forge now and you find some presentations on ff at
http://ff.r-forge.r-project.org/

Hope you find this useful. We appreciate any feedback.


Jens & Daniel



More information about the R-packages mailing list