[R] large data set, error: cannot allocate vector

Robert Citek rwcitek at alum.calberkeley.org
Tue May 9 20:22:23 CEST 2006


On May 8, 2006, at 9:47 AM, Thomas Lumley wrote:
> On Fri, 5 May 2006, Robert Citek wrote:
>> Reloading the 10 MM dataset:
>>
>> R > foo <- read.delim("dataset.010MM.txt")
>>
>> R > object.size(foo)
>> [1] 440000376
>>
>> R > gc()
>>            used  (Mb) gc trigger  (Mb) max used  (Mb)
>> Ncells 10183941 272.0   15023450 401.2 10194267 272.3
>> Vcells 20073146 153.2   53554505 408.6 50086180 382.2
>>
>> Combined, Ncells or Vcells appear to take up about 700 MB of RAM,
>> which is about 25% of the 3 GB available under Linux on 32-bit
>> architecture.  Also, removing foo seemed to free up "used" memory,
>> but didn't change the "max used":
>
> No, that's what "max" means.  You need gc(reset=TRUE) to reset the  
> max.

Yup, that worked (see below).  The example from ?gc wasn't that clear  
to me.  Thanks for clarifying.  I also found it informative to  
compare loading data into a data.frame vs a vector.

$ cat <<eof | R -q --no-save
gc()
foo <- read.delim("dataset.010MM.txt")
gc()
rm(foo)
gc()
gc(reset=TRUE)
eof

R > gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 177865  4.8     407500 10.9   350000  9.4
Vcells  72114  0.6     786432  6.0   333941  2.6

R > foo <- read.delim("dataset.010MM.txt")

R > gc()
            used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 10179849 271.9   15023450 401.2 10180159 271.9
Vcells 20072448 153.2   47764583 364.5 46849682 357.5

R > rm(foo)

R > gc()
          used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 179910  4.9   12018759 321.0 10181187 271.9
Vcells  72458  0.6   38211666 291.6 46849682 357.5

R > gc(reset=TRUE)
          used (Mb) gc trigger  (Mb) max used (Mb)
Ncells 179920  4.9    9615007 256.8   179920  4.9
Vcells  72482  0.6   30569332 233.3    72482  0.6

$ cat <<eof | R -q --no-save
gc()
foo <- scan("dataset.010MM.txt")
gc()
rm(foo)
gc()
gc(reset=TRUE)
eof

R > gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 177865  4.8     407500 10.9   350000  9.4
Vcells  72114  0.6     786432  6.0   333941  2.6

R > foo <- scan("dataset.010MM.txt")
Read 10000000 items

R > gc()
            used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   178230  4.8     407500  10.9   350000   9.4
Vcells 10072185 76.9   26713872 203.9 26456224 201.9

R > rm(foo)

R > gc()
          used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 178286  4.8     407500  10.9   350000   9.4
Vcells  72190  0.6   21371097 163.1 26456224 201.9

R > gc(reset=TRUE)
          used (Mb) gc trigger  (Mb) max used (Mb)
Ncells 178296  4.8     407500  10.9   178296  4.8
Vcells  72214  0.6   17096877 130.5    72214  0.6

Regards,
- Robert
http://www.cwelug.org/downloads
Help others get OpenSource software.  Distribute FLOSS
for Windows, Linux, *BSD, and MacOS X with BitTorrent




More information about the R-help mailing list