[R] Enormous Datasets
andy_liaw at merck.com
Thu Nov 18 22:33:14 CET 2004
It depends on what you want to do with that data in R. If you want to play
with the whole data, just storing it in R will require more than 2.6GB of
memory (assuming all data are numeric and are stored as doubles):
> 7e6 * 50 * 8 / 1024^2
That's not impossible, but you'll need to be on a computer with quite a bit
more memory than that, and running on an OS that supports it. If that's not
feasible, you need to re-think what you want to do with that data in R
(e.g., read in and process a small chunk at a time, or read in a random
> From: Thomas W Volscho
> Dear List,
> I have some projects where I use enormous datasets. For
> instance, the 5% PUMS microdata from the Census Bureau.
> After deleting cases I may have a dataset with 7 million+
> rows and 50+ columns. Will R handle a datafile of this size?
> If so, how?
> Thank you in advance,
> Tom Volscho
> Thomas W. Volscho
> Graduate Student
> Dept. of Sociology U-2068
> University of Connecticut
> Storrs, CT 06269
> Phone: (860) 486-3882
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide!
More information about the R-help