[R] Very Large Data Sets

Thomas Lumley thomas at biostat.washington.edu
Thu Dec 23 17:49:06 CET 1999

On Thu, 23 Dec 1999 kmself at ix.netcom.com wrote:
> When dealing with large datasets outside of SAS, my suggestion would be
> to look to tools such as Perl and MySQL to handle the procedural and
> relational processing of data, using R as an analytic tool.  Most simple
> statistics (subsetting, aggregation, drilldown) can be accommodated
> through these sorts of tools.   Think of the relationship to R as the
> division as between the DATA step and SAS/STAT or SAS/GRAPH.
> I would be interested to know of any data cube tools which are freely
> available or available as free software.

The S-PLUS package for the netCDF format, written by Steve Oncley of NCAR,
allows reading of arbitrary "slabs" of a very large data file. At one
point he was planning to write an R version, but I can't remember what
happened and my email records for the relevant time were eaten by a
Microsoft Outlook/Pine disagreement. 

This would allow you to work with large data files one piece at a time (if
they were netCDF files). Something similar could be done with mmap(2) if
your OS allows addressing that much memory (which they mostly will soon). 

Thomas Lumley
Assistant Professor, Biostatistics
University of Washington, Seattle

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list