R-beta: Memory Management in R-0.50-a4

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Thu Nov 27 13:35:59 CET 1997

Ian Thurlbeck <ian at stams.strath.ac.uk> writes:

> Dear R users
> we're having a problem reading a largish data file using
> read.table().  The file consists of 175000 lines of 4
> floating pt numbers. Here's what happens:
> I edited the memory limits in Platform.h and re-compiled
> and now read.table() can manage up to around 125000 lines.
> #define R_VSIZE 30000000L       /* 15 times original figure (Defn.h) */
> #define R_NSIZE  1000000L       /* 5 times original figure (Defn.h) */
> #define R_PPSSIZE 100000L       /* 10 times original figure (Defn.h) */

The first two of those are settable via command line options, e.g.

R -v 50 

should get you a 50M memory heap.

> Clearly I can keep upping these values until it works, but has
> the side-effect of making the running R binary pretty big.
> What can I do? Is the answer a better memory management
> system ?

That wouldn't hurt, but... The actual numbers require only about 6M of
storage, so the real trouble is only there during the read.table. What
you could do is to read the data (in a Large process), save it to a
binary file and read that into a smaller process.

Another thing you could do is to switch to reading the values with
scan(). Read.table() is trying to be intelligent about data types and
soforth, which tends to make it inefficient on large data sets.

   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list