R-beta: read.table and large datasets

Ross Ihaka ihaka at stat.auckland.ac.nz
Mon Mar 9 20:08:13 CET 1998


RW> From: Rick White <rick at stat.ubc.ca>
RW> Subject: R-beta: read.table and large datasets
RW> 
RW> I find that read.table cannot handle large datasets. Suppose data is a
RW> 40000 x 6 dataset
RW> 
RW> R -v 100
RW> 
RW> x_read.table("data")  gives
RW> Error: memory exhausted
RW> but
RW> x_as.data.frame(matrix(scan("data"),byrow=T,ncol=6))
RW> works fine.
RW> 
RW> read.table is less typing ,I can include the variable names in the first
RW> line and in Splus executes faster. Is there a fix for read.table on the
RW> way?

[ I wouldn't be too sure that read.table executes faster.  I think
  it just calls scan ... ]

This is a known R problem.  The real problem is that read.table reads
everything as character strings and the implementation of character
strings is "suboptimal".  This is a low-level problem and such problems
are fairly hard to fix because any changes affect almost every bit of
code.

As a temporary fix you might try enlarging the memory used for "cons cells"
with the -n flag.  Try something like

	R -n 400000 -v 10

Longer term, something will be done about it, but don't hold your breath.

	Ross
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list