[R] Large datasets under R
ihaka at stat.auckland.ac.nz
Wed Feb 23 09:50:29 CET 2000
On Wed, Feb 23, 2000 at 08:27:00AM +0000, Prof Brian D Ripley wrote:
> On Tue, 22 Feb 2000, Stephen R. Laniel wrote:
> > Hello,
> > I recall reading a thread months ago on this mailing list about handling
> > very datasets under R, but I can't seem to find it. This has become
> > particularly important recently, because I've been playing with a dataset
> > containing information about every fatal car accident in the U.S. since
> > 1975; in total, the relevant files are about 120 megs. I'd like to load
> > all of these into R at once and do some longitudinal analyses. R seems to
> > choke on tables above a few megabytes. From what I can tell, these are
> > memory management issues; the user must allocate memory using command-line
> > switches.
> Yes, ?Memory will tell you how.
> > I'm using R 0.90 under Windows NT 4.0. Have things improved in more recent
> > versions? Are they expected to improve soon?
> No, not for that size of dataset. The current garbage collector is too
> slow: this might change fairly soon.
Just to add some details. The easy (== feasible) approach here is to
store all system objects using "malloc" when they are assigned,
rather than putting them into the space examined by the collector.
This means that the collector will only inspect the user's data and
hopefully this will spead things up a lot.
In addition, Robert has plans to support demand loading of objects
from external files using exception handling. This means that only
the objects being used need reside in memory. This is very much like
the S approach.
Mucking about with memory management is quite likely to break a ton
of stuff, so it won't happen until we get a bit more stability
(i.e. post 1.0).
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help