[R] large data set, error: cannot allocate vector

Jason Barnhart jasoncbarnhart at msn.com
Fri May 5 20:24:32 CEST 2006


I can store a 100,000,000 records in about the same space on WinXP, 
with --max-mem-size set to 1700M. I have also successfully stored larger 
objects.

Like you there's not enough space to process a summary, but I've only got 
2GB of RAM.  I've successfully allocated more RAM to R on my Linux box (it 
has 4GB RAM) and processed larger objects.

Have you tried playing w/ the memory settings?

My results are below.
-jason


> tmp<-100000000:200000000
> length(tmp)/1000000
[1] 100
> gc()
           used  (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells   172832   4.7     350000   9.4    350000    9.4
Vcells 50063180 382.0  120448825 919.0 150074853 1145.0
> object.size(tmp)/length(tmp)
[1] 4
> object.size(tmp)
[1] 4e+08
> print(object.size(tmp)/1024^2,digits=15)
[1] 381.469760894775
> summary(tmp)
Error: cannot allocate vector of size 390625 Kb





----- Original Message ----- 
From: "Robert Citek" <rwcitek at alum.calberkeley.org>
To: <r-help at stat.math.ethz.ch>
Sent: Friday, May 05, 2006 9:30 AM
Subject: Re: [R] large data set, error: cannot allocate vector


>
> Oops.  I was off by an order of magnitude.  I meant 10^7 and 10^8
> rows of data for the first and second data sets, respectively.
>
> On May 5, 2006, at 10:24 AM, Robert Citek wrote:
>> R > foo <- read.delim("dataset.010MM.txt")
>>
>> R > summary(foo)
>>       X15623
>> Min.   :    1
>> 1st Qu.: 8152
>> Median :16459
>> Mean   :16408
>> 3rd Qu.:24618
>> Max.   :32766
>
> Reloaded the 10MM set and ran an object.size:
>
> R > object.size(foo)
> [1] 440000376
>
> So, 10 MM numbers in about 440 MB. (Are my units correct?)  That
> would explain why 10 MM numbers does work while 100 MM numbers won't
> work (4 GB limit on 32-bit machine).  If my units are correct, then
> each value would be taking up 4-bytes, which sounds right for a 4-
> byte word (8 bits/byte * 4-bytes = 32-bits.)
>
> From Googling the archives, the solution that I've seen for working
> with large data sets seems to be moving to a 64-bit architecture.
> Short of that, are there any other generic workarounds, perhaps using
> a RDBMS or a CRAN package that enables working with arbitrarily large
> data sets?
>
> Regards,
> - Robert
> http://www.cwelug.org/downloads
> Help others get OpenSource software.  Distribute FLOSS
> for Windows, Linux, *BSD, and MacOS X with BitTorrent
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list