[R] Large datasets under R
Prof Brian D Ripley
ripley at stats.ox.ac.uk
Wed Feb 23 09:27:00 CET 2000
On Tue, 22 Feb 2000, Stephen R. Laniel wrote:
> I recall reading a thread months ago on this mailing list about handling
> very datasets under R, but I can't seem to find it. This has become
> particularly important recently, because I've been playing with a dataset
> containing information about every fatal car accident in the U.S. since
> 1975; in total, the relevant files are about 120 megs. I'd like to load
> all of these into R at once and do some longitudinal analyses. R seems to
> choke on tables above a few megabytes. From what I can tell, these are
> memory management issues; the user must allocate memory using command-line
Yes, ?Memory will tell you how.
> I'm using R 0.90 under Windows NT 4.0. Have things improved in more recent
> versions? Are they expected to improve soon?
No, not for that size of dataset. The current garbage collector is too
slow: this might change fairly soon.
> In the meantime, what can we
> do to make large-dataset analysis more convenient?
The idea is to use interfaces to databases to pull over just the bits of
the dataset which are needed, including doing the analyses in chunks. There
are several database interfaces around, and plans to pull them into a
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help