R-beta: read.table and large datasets
Ross Ihaka
ihaka at stat.auckland.ac.nz
Mon Mar 9 20:08:13 CET 1998
RW> From: Rick White <rick at stat.ubc.ca>
RW> Subject: R-beta: read.table and large datasets
RW>
RW> I find that read.table cannot handle large datasets. Suppose data is a
RW> 40000 x 6 dataset
RW>
RW> R -v 100
RW>
RW> x_read.table("data") gives
RW> Error: memory exhausted
RW> but
RW> x_as.data.frame(matrix(scan("data"),byrow=T,ncol=6))
RW> works fine.
RW>
RW> read.table is less typing ,I can include the variable names in the first
RW> line and in Splus executes faster. Is there a fix for read.table on the
RW> way?
[ I wouldn't be too sure that read.table executes faster. I think
it just calls scan ... ]
This is a known R problem. The real problem is that read.table reads
everything as character strings and the implementation of character
strings is "suboptimal". This is a low-level problem and such problems
are fairly hard to fix because any changes affect almost every bit of
code.
As a temporary fix you might try enlarging the memory used for "cons cells"
with the -n flag. Try something like
R -n 400000 -v 10
Longer term, something will be done about it, but don't hold your breath.
Ross
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list