[R] Memory/data -last time I promise

Micheall Taylor pols1oh at bestweb.net
Tue Jul 24 14:00:38 CEST 2001



I've seen several posts over the past 2-3 weeks about memory issues.  I've
tried to carefully follow the suggestions, but remain baffled as to why I
can't load data into R.  I hope that in revisiting this issue that I don't
exasperate the list.

The setting: 
1 gig RAM , Linux machine
10 Stata files of approximately 14megs each
File contents appear at the end of this boorishly long email.

Purpose: 
load and combine in R for further analysis

Question:

1) I've placed memory queries in the command file to see what is going on. 
It appears that loading a 14meg file consumes approx 5 times this amount of
memory - i.e. available memory declines by 70megs when a 14 meg dataset is
loaded. (Seen in Method 2 below)
2) Ultimately I would like to replace Stata with R, but the Stata datasets
I frequently use are in the 100s of megs, which work fine on this machine.
Is R capable of this?


The command files:

I've attempted the process in to ways (each time as regular user
(ulimit=unlimited; and as root on the system to avoid OS restrictions). 
The first method is as follows:

METHOD ONE

R --no-save --max-vsize=800M < QuickLook.R > QuickLook.log
========   QuickLook.log follows ================
> library(foreign)
> a <- Sys.time()
> full <- read.dta('../off/off10yr1.dta')
> gc()
          used (Mb) gc trigger (Mb) limit (Mb)
Ncells 1018821 27.3    1166886 31.2         NA
Vcells 4456284 34.0    5070089 38.7        800
> system('cat /proc/meminfo')
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 696303616 376999936        0 21487616 36982784
Swap: 271392768 263294976  8097792
MemTotal:   1048148 kB
MemFree:     368164 kB
MemShared:        0 kB
Buffers:      20984 kB
Cached:       36116 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:      7908 kB
> n <- 2
> while (n<=3) {
+       fname1 <- paste('../off/off10yr',n,'.dta', sep="")
+       full <- rbind(read.dta(fname1),  full)
+       gc()
+       system('cat /proc/meminfo')
+       n
+       n <- n+1}
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 780275712 293027840        0 21487616 51609600
Swap: 271392768 263294976  8097792
MemTotal:   1048148 kB
MemFree:     286160 kB
MemShared:        0 kB
Buffers:      20984 kB
Cached:       50400 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:      7908 kB
Error: cannot allocate vector of size 3291 Kb
Execution halted



SECOND METHOD


> library(foreign)
> system('cat /proc/meminfo')
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 637681664 435621888        0 21753856 31592448
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:     425412 kB
MemShared:        0 kB
Buffers:      21244 kB
Cached:       30852 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB
> a <- Sys.time()
> full1 <- read.dta('../off/off10yr1.dta')
> gc()
          used (Mb) gc trigger (Mb) limit (Mb)
Ncells 1018825 27.3    1166886 31.2         NA
Vcells 4456285 34.0    5070086 38.7        800
> system('cat /proc/meminfo')
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 707162112 366141440        0 21757952 45498368
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:     357560 kB
MemShared:        0 kB
Buffers:      21248 kB
Cached:       44432 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB
> full2 <- read.dta('../off/off10yr2.dta')
> gc()
          used (Mb) gc trigger (Mb) limit (Mb)
Ncells 1861390 49.8    2105982 56.3         NA
Vcells 8879476 67.8    9315972 71.1        800
> system('cat /proc/meminfo')
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 777375744 295927808        0 21757952 59826176
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:     288992 kB
MemShared:        0 kB
Buffers:      21248 kB
Cached:       58424 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB
> full3 <- read.dta('../off/off10yr3.dta')
> gc()
           used  (Mb) gc trigger  (Mb) limit (Mb)
Ncells  2703952  72.3    3708127  99.1         NA
Vcells 13302667 101.5   14190661 108.3        800
> system('cat /proc/meminfo')
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 847650816 225652736        0 21757952 74153984
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:     220364 kB
MemShared:        0 kB
Buffers:      21248 kB
Cached:       72416 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB
> full4 <- read.dta('../off/off10yr4.dta')
> gc()
           used  (Mb) gc trigger  (Mb) limit (Mb)
Ncells  3546514  94.8    4953636 132.3         NA
Vcells 17725858 135.3   18735437 143.0        800
> system('cat /proc/meminfo')
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 917798912 155504640        0 21762048 88481792
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:     151860 kB
MemShared:        0 kB
Buffers:      21252 kB
Cached:       86408 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB
> full5 <- read.dta('../off/off10yr5.dta')
> gc()
           used  (Mb) gc trigger  (Mb) limit (Mb)
Ncells  4389076 117.3    6193578 165.4         NA
Vcells 22149049 169.0   23279670 177.7        800
> system('cat /proc/meminfo')
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 988033024 85270528        0 21770240 102809600
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:      83272 kB
MemShared:        0 kB
Buffers:      21260 kB
Cached:      100400 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB
> full6 <- read.dta('../off/off10yr6.dta')
> gc()
           used  (Mb) gc trigger  (Mb) limit (Mb)
Ncells  5231638 139.7    7700734 205.7         NA
Vcells 26572240 202.8   27312192 208.4        800
> system('cat /proc/meminfo')
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 1058263040 15040512        0 21774336 117137408
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:      14688 kB
MemShared:        0 kB
Buffers:      21264 kB
Cached:      114392 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB
> full7 <- read.dta('../off/off10yr7.dta')
> gc()
           used  (Mb) gc trigger  (Mb) limit (Mb)
Ncells  6074200 162.2    8572058 228.9         NA
Vcells 30995431 236.5   31726362 242.1        800
> system('cat /proc/meminfo')
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 1069006848  4296704        0 21471232 72318976
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:       4196 kB
MemShared:        0 kB
Buffers:      20968 kB
Cached:       70624 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB
> full8 <- read.dta('../off/off10yr8.dta')
Error: cannot allocate vector of size 1645 Kb
Execution halted


THIRD METHOD

I combined the the stata files in stata (same machine) and saved them as a
single file thinking there could be an inefficiency with  rbind(). Same
error code.


TO ASSURE YOU THAT I AM NOT CRAZY, THE FOLLOWING IS A SAMPLE DIRECTORY
LISTING OF THE FILES OF INTEREST

-rw-r--r--    1 ctaylor  econ          14M Jun 27 16:15 off10yr5.dta
-rw-r--r--    1 ctaylor  econ          14M Jun 27 17:53 off10yr6.dta
-rw-r--r--    1 ctaylor  econ          14M Jun 27 19:30 off10yr7.dta
-rw-r--r--    1 ctaylor  econ          14M Jun 27 21:08 off10yr8.dta
-rw-r--r--    1 ctaylor  econ          14M Jun 27 23:02 off10yr9.dt


DATA CONTENTS (IN TEXT FORM OF COURSE)

head off10yr1.out
scenario        metcode yr      ginv    cons    gocc    abs     dvac    gmre    gmer
1       "AA"    2001    .04     3384000 .047    3641000 -.006   .025    .028
1       "AA"    2002    .042    3657000 .046    3716000 -.004   .034    .035
1       "AA"    2003    .031    2816000 .047    3972000 -.015   .051    .056
1       "AA"    2004    .035    3271000 .046    4064000 -.01    .075    .078
1       "AA"    2005    .037    3636000 .037    3444000 0       .084    .084
1       "AA"    2006    .041    4183000 .035    3315000 .006    .118    .116
1       "AA"    2007    .043    4513000 .019    1915000 .021    .094    .086
1       "AA"    2008    .039    4320000 .034    3431000 .005    .068    .066
1       "AA"    2009    .034    3848000 .05     5262000 -.015   .057    .063




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list