[R] large data set, error: cannot allocate vector
Jason Barnhart
jasoncbarnhart at msn.com
Fri May 5 20:24:32 CEST 2006
I can store a 100,000,000 records in about the same space on WinXP,
with --max-mem-size set to 1700M. I have also successfully stored larger
objects.
Like you there's not enough space to process a summary, but I've only got
2GB of RAM. I've successfully allocated more RAM to R on my Linux box (it
has 4GB RAM) and processed larger objects.
Have you tried playing w/ the memory settings?
My results are below.
-jason
> tmp<-100000000:200000000
> length(tmp)/1000000
[1] 100
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 172832 4.7 350000 9.4 350000 9.4
Vcells 50063180 382.0 120448825 919.0 150074853 1145.0
> object.size(tmp)/length(tmp)
[1] 4
> object.size(tmp)
[1] 4e+08
> print(object.size(tmp)/1024^2,digits=15)
[1] 381.469760894775
> summary(tmp)
Error: cannot allocate vector of size 390625 Kb
----- Original Message -----
From: "Robert Citek" <rwcitek at alum.calberkeley.org>
To: <r-help at stat.math.ethz.ch>
Sent: Friday, May 05, 2006 9:30 AM
Subject: Re: [R] large data set, error: cannot allocate vector
>
> Oops. I was off by an order of magnitude. I meant 10^7 and 10^8
> rows of data for the first and second data sets, respectively.
>
> On May 5, 2006, at 10:24 AM, Robert Citek wrote:
>> R > foo <- read.delim("dataset.010MM.txt")
>>
>> R > summary(foo)
>> X15623
>> Min. : 1
>> 1st Qu.: 8152
>> Median :16459
>> Mean :16408
>> 3rd Qu.:24618
>> Max. :32766
>
> Reloaded the 10MM set and ran an object.size:
>
> R > object.size(foo)
> [1] 440000376
>
> So, 10 MM numbers in about 440 MB. (Are my units correct?) That
> would explain why 10 MM numbers does work while 100 MM numbers won't
> work (4 GB limit on 32-bit machine). If my units are correct, then
> each value would be taking up 4-bytes, which sounds right for a 4-
> byte word (8 bits/byte * 4-bytes = 32-bits.)
>
> From Googling the archives, the solution that I've seen for working
> with large data sets seems to be moving to a 64-bit architecture.
> Short of that, are there any other generic workarounds, perhaps using
> a RDBMS or a CRAN package that enables working with arbitrarily large
> data sets?
>
> Regards,
> - Robert
> http://www.cwelug.org/downloads
> Help others get OpenSource software. Distribute FLOSS
> for Windows, Linux, *BSD, and MacOS X with BitTorrent
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list