[R] Enormous Datasets
Liaw, Andy
andy_liaw at merck.com
Thu Nov 18 22:33:14 CET 2004
It depends on what you want to do with that data in R. If you want to play
with the whole data, just storing it in R will require more than 2.6GB of
memory (assuming all data are numeric and are stored as doubles):
> 7e6 * 50 * 8 / 1024^2
[1] 2670.288
That's not impossible, but you'll need to be on a computer with quite a bit
more memory than that, and running on an OS that supports it. If that's not
feasible, you need to re-think what you want to do with that data in R
(e.g., read in and process a small chunk at a time, or read in a random
sample, etc.).
Andy
> From: Thomas W Volscho
>
> Dear List,
> I have some projects where I use enormous datasets. For
> instance, the 5% PUMS microdata from the Census Bureau.
> After deleting cases I may have a dataset with 7 million+
> rows and 50+ columns. Will R handle a datafile of this size?
> If so, how?
>
> Thank you in advance,
> Tom Volscho
>
> ************************************
> Thomas W. Volscho
> Graduate Student
> Dept. of Sociology U-2068
> University of Connecticut
> Storrs, CT 06269
> Phone: (860) 486-3882
> http://vm.uconn.edu/~twv00001
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list