[R] big data?

Mike Harwood maharwood7 at gmail.com
Wed Aug 6 18:39:41 CEST 2014


The read.table.ffdf function in the ff package can read in delimited files 
and store them to disk as individual columns.  The ffbase package provides 
additional data management and analytic functionality.  I have used these 
packages on 15 Gb files of 18 million rows and 250 columns.


On Tuesday, August 5, 2014 1:39:03 PM UTC-5, David Winsemius wrote:
>
>
> On Aug 5, 2014, at 10:20 AM, Spencer Graves wrote: 
>
> >      What tools do you like for working with tab delimited text files up 
> to 1.5 GB (under Windows 7 with 8 GB RAM)? 
>
> ?data.table::fread 
>
> >      Standard tools for smaller data sometimes grab all the available 
> RAM, after which CPU usage drops to 3% ;-) 
> > 
> > 
> >      The "bigmemory" project won the 2010 John Chambers Award but "is 
> not available (for R version 3.1.0)". 
> > 
> > 
> >      findFn("big data", 999) downloaded 961 links in 437 packages. That 
> contains tools for data PostgreSQL and other formats, but I couldn't find 
> anything for large tab delimited text files. 
> > 
> > 
> >      Absent a better idea, I plan to write a function getField to 
> extract a specific field from the data, then use that to split the data 
> into 4 smaller files, which I think should be small enough that I can do 
> what I want. 
>
> There is the colbycol package with which I have no experience, but I 
> understand it is designed to partition data into column sized objects. 
> #--- from its help file----- 
> cbc.get.col {colbycol}        R Documentation 
> Reads a single column from the original file into memory 
>
> Description 
>
> Function cbc.read.table reads a file, stores it column by column in disk 
> file and creates a colbycol object. Functioncbc.get.col queries this object 
> and returns a single column. 
>
> >      Thanks, 
> >      Spencer 
> > 
> > ______________________________________________ 
> > R-h... at r-project.org <javascript:> mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html 
> > and provide commented, minimal, self-contained, reproducible code. 
>
> David Winsemius 
> Alameda, CA, USA 
>
> ______________________________________________ 
> R-h... at r-project.org <javascript:> mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code. 
>


More information about the R-help mailing list