[R] Working with large datafiles

Huntsinger, Reid reid_huntsinger at merck.com
Mon Oct 4 18:19:24 CEST 2004

Out of the box R keeps everything in memory. 1 million wide records could
easily take all your RAM. What do you want to do with all the data at once?
Some suggestions (not original by any means)

1) read the data via the "connection" functions, which would allow you to
for example keep the data gzipped (help(gzfile)) and read chunks at a time,
e.g., in order to
2) sample
3) if you really need more or less random access to records, look into the
database access packages for postgres or oracle etc, or have a look at the
RObjectTables package from Omegahat (I don't have experience with it yet).
4) I wrote some R functions to "stash" objects to disk so they're still
"there" just like any R object but don't use RAM. Each access reads the
whole object, though, and each write writes the whole object, so it's not at
all suited to random access. Let me know if it would help.

Reid Huntsinger

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Greg Butler
Sent: Monday, October 04, 2004 10:49 AM
To: R-help at stat.math.ethz.ch
Subject: [R] Working with large datafiles


I have been enjoying r for some time now, but was wondering about working
with larger data files.  When I try to load in big files with more than
20,000 records, the programs seems unbable to store all the records.  Is
there some way that I can increase the size of records that I work with?
Ideally I would like to work with census data which can hold a million


R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide!

More information about the R-help mailing list