[R] large data set, error: cannot allocate vector
Robert Citek
rwcitek at alum.calberkeley.org
Fri May 5 18:30:15 CEST 2006
Oops. I was off by an order of magnitude. I meant 10^7 and 10^8
rows of data for the first and second data sets, respectively.
On May 5, 2006, at 10:24 AM, Robert Citek wrote:
> R > foo <- read.delim("dataset.010MM.txt")
>
> R > summary(foo)
> X15623
> Min. : 1
> 1st Qu.: 8152
> Median :16459
> Mean :16408
> 3rd Qu.:24618
> Max. :32766
Reloaded the 10MM set and ran an object.size:
R > object.size(foo)
[1] 440000376
So, 10 MM numbers in about 440 MB. (Are my units correct?) That
would explain why 10 MM numbers does work while 100 MM numbers won't
work (4 GB limit on 32-bit machine). If my units are correct, then
each value would be taking up 4-bytes, which sounds right for a 4-
byte word (8 bits/byte * 4-bytes = 32-bits.)
From Googling the archives, the solution that I've seen for working
with large data sets seems to be moving to a 64-bit architecture.
Short of that, are there any other generic workarounds, perhaps using
a RDBMS or a CRAN package that enables working with arbitrarily large
data sets?
Regards,
- Robert
http://www.cwelug.org/downloads
Help others get OpenSource software. Distribute FLOSS
for Windows, Linux, *BSD, and MacOS X with BitTorrent
More information about the R-help
mailing list