[R] Memory/data -last time I promise
Micheall Taylor
pols1oh at bestweb.net
Tue Jul 24 14:00:38 CEST 2001
I've seen several posts over the past 2-3 weeks about memory issues. I've
tried to carefully follow the suggestions, but remain baffled as to why I
can't load data into R. I hope that in revisiting this issue that I don't
exasperate the list.
The setting:
1 gig RAM , Linux machine
10 Stata files of approximately 14megs each
File contents appear at the end of this boorishly long email.
Purpose:
load and combine in R for further analysis
Question:
1) I've placed memory queries in the command file to see what is going on.
It appears that loading a 14meg file consumes approx 5 times this amount of
memory - i.e. available memory declines by 70megs when a 14 meg dataset is
loaded. (Seen in Method 2 below)
2) Ultimately I would like to replace Stata with R, but the Stata datasets
I frequently use are in the 100s of megs, which work fine on this machine.
Is R capable of this?
The command files:
I've attempted the process in to ways (each time as regular user
(ulimit=unlimited; and as root on the system to avoid OS restrictions).
The first method is as follows:
METHOD ONE
R --no-save --max-vsize=800M < QuickLook.R > QuickLook.log
======== QuickLook.log follows ================
> library(foreign)
> a <- Sys.time()
> full <- read.dta('../off/off10yr1.dta')
> gc()
used (Mb) gc trigger (Mb) limit (Mb)
Ncells 1018821 27.3 1166886 31.2 NA
Vcells 4456284 34.0 5070089 38.7 800
> system('cat /proc/meminfo')
total: used: free: shared: buffers: cached:
Mem: 1073303552 696303616 376999936 0 21487616 36982784
Swap: 271392768 263294976 8097792
MemTotal: 1048148 kB
MemFree: 368164 kB
MemShared: 0 kB
Buffers: 20984 kB
Cached: 36116 kB
BigTotal: 131064 kB
BigFree: 0 kB
SwapTotal: 265032 kB
SwapFree: 7908 kB
> n <- 2
> while (n<=3) {
+ fname1 <- paste('../off/off10yr',n,'.dta', sep="")
+ full <- rbind(read.dta(fname1), full)
+ gc()
+ system('cat /proc/meminfo')
+ n
+ n <- n+1}
total: used: free: shared: buffers: cached:
Mem: 1073303552 780275712 293027840 0 21487616 51609600
Swap: 271392768 263294976 8097792
MemTotal: 1048148 kB
MemFree: 286160 kB
MemShared: 0 kB
Buffers: 20984 kB
Cached: 50400 kB
BigTotal: 131064 kB
BigFree: 0 kB
SwapTotal: 265032 kB
SwapFree: 7908 kB
Error: cannot allocate vector of size 3291 Kb
Execution halted
SECOND METHOD
> library(foreign)
> system('cat /proc/meminfo')
total: used: free: shared: buffers: cached:
Mem: 1073303552 637681664 435621888 0 21753856 31592448
Swap: 271392768 261148672 10244096
MemTotal: 1048148 kB
MemFree: 425412 kB
MemShared: 0 kB
Buffers: 21244 kB
Cached: 30852 kB
BigTotal: 131064 kB
BigFree: 0 kB
SwapTotal: 265032 kB
SwapFree: 10004 kB
> a <- Sys.time()
> full1 <- read.dta('../off/off10yr1.dta')
> gc()
used (Mb) gc trigger (Mb) limit (Mb)
Ncells 1018825 27.3 1166886 31.2 NA
Vcells 4456285 34.0 5070086 38.7 800
> system('cat /proc/meminfo')
total: used: free: shared: buffers: cached:
Mem: 1073303552 707162112 366141440 0 21757952 45498368
Swap: 271392768 261148672 10244096
MemTotal: 1048148 kB
MemFree: 357560 kB
MemShared: 0 kB
Buffers: 21248 kB
Cached: 44432 kB
BigTotal: 131064 kB
BigFree: 0 kB
SwapTotal: 265032 kB
SwapFree: 10004 kB
> full2 <- read.dta('../off/off10yr2.dta')
> gc()
used (Mb) gc trigger (Mb) limit (Mb)
Ncells 1861390 49.8 2105982 56.3 NA
Vcells 8879476 67.8 9315972 71.1 800
> system('cat /proc/meminfo')
total: used: free: shared: buffers: cached:
Mem: 1073303552 777375744 295927808 0 21757952 59826176
Swap: 271392768 261148672 10244096
MemTotal: 1048148 kB
MemFree: 288992 kB
MemShared: 0 kB
Buffers: 21248 kB
Cached: 58424 kB
BigTotal: 131064 kB
BigFree: 0 kB
SwapTotal: 265032 kB
SwapFree: 10004 kB
> full3 <- read.dta('../off/off10yr3.dta')
> gc()
used (Mb) gc trigger (Mb) limit (Mb)
Ncells 2703952 72.3 3708127 99.1 NA
Vcells 13302667 101.5 14190661 108.3 800
> system('cat /proc/meminfo')
total: used: free: shared: buffers: cached:
Mem: 1073303552 847650816 225652736 0 21757952 74153984
Swap: 271392768 261148672 10244096
MemTotal: 1048148 kB
MemFree: 220364 kB
MemShared: 0 kB
Buffers: 21248 kB
Cached: 72416 kB
BigTotal: 131064 kB
BigFree: 0 kB
SwapTotal: 265032 kB
SwapFree: 10004 kB
> full4 <- read.dta('../off/off10yr4.dta')
> gc()
used (Mb) gc trigger (Mb) limit (Mb)
Ncells 3546514 94.8 4953636 132.3 NA
Vcells 17725858 135.3 18735437 143.0 800
> system('cat /proc/meminfo')
total: used: free: shared: buffers: cached:
Mem: 1073303552 917798912 155504640 0 21762048 88481792
Swap: 271392768 261148672 10244096
MemTotal: 1048148 kB
MemFree: 151860 kB
MemShared: 0 kB
Buffers: 21252 kB
Cached: 86408 kB
BigTotal: 131064 kB
BigFree: 0 kB
SwapTotal: 265032 kB
SwapFree: 10004 kB
> full5 <- read.dta('../off/off10yr5.dta')
> gc()
used (Mb) gc trigger (Mb) limit (Mb)
Ncells 4389076 117.3 6193578 165.4 NA
Vcells 22149049 169.0 23279670 177.7 800
> system('cat /proc/meminfo')
total: used: free: shared: buffers: cached:
Mem: 1073303552 988033024 85270528 0 21770240 102809600
Swap: 271392768 261148672 10244096
MemTotal: 1048148 kB
MemFree: 83272 kB
MemShared: 0 kB
Buffers: 21260 kB
Cached: 100400 kB
BigTotal: 131064 kB
BigFree: 0 kB
SwapTotal: 265032 kB
SwapFree: 10004 kB
> full6 <- read.dta('../off/off10yr6.dta')
> gc()
used (Mb) gc trigger (Mb) limit (Mb)
Ncells 5231638 139.7 7700734 205.7 NA
Vcells 26572240 202.8 27312192 208.4 800
> system('cat /proc/meminfo')
total: used: free: shared: buffers: cached:
Mem: 1073303552 1058263040 15040512 0 21774336 117137408
Swap: 271392768 261148672 10244096
MemTotal: 1048148 kB
MemFree: 14688 kB
MemShared: 0 kB
Buffers: 21264 kB
Cached: 114392 kB
BigTotal: 131064 kB
BigFree: 0 kB
SwapTotal: 265032 kB
SwapFree: 10004 kB
> full7 <- read.dta('../off/off10yr7.dta')
> gc()
used (Mb) gc trigger (Mb) limit (Mb)
Ncells 6074200 162.2 8572058 228.9 NA
Vcells 30995431 236.5 31726362 242.1 800
> system('cat /proc/meminfo')
total: used: free: shared: buffers: cached:
Mem: 1073303552 1069006848 4296704 0 21471232 72318976
Swap: 271392768 261148672 10244096
MemTotal: 1048148 kB
MemFree: 4196 kB
MemShared: 0 kB
Buffers: 20968 kB
Cached: 70624 kB
BigTotal: 131064 kB
BigFree: 0 kB
SwapTotal: 265032 kB
SwapFree: 10004 kB
> full8 <- read.dta('../off/off10yr8.dta')
Error: cannot allocate vector of size 1645 Kb
Execution halted
THIRD METHOD
I combined the the stata files in stata (same machine) and saved them as a
single file thinking there could be an inefficiency with rbind(). Same
error code.
TO ASSURE YOU THAT I AM NOT CRAZY, THE FOLLOWING IS A SAMPLE DIRECTORY
LISTING OF THE FILES OF INTEREST
-rw-r--r-- 1 ctaylor econ 14M Jun 27 16:15 off10yr5.dta
-rw-r--r-- 1 ctaylor econ 14M Jun 27 17:53 off10yr6.dta
-rw-r--r-- 1 ctaylor econ 14M Jun 27 19:30 off10yr7.dta
-rw-r--r-- 1 ctaylor econ 14M Jun 27 21:08 off10yr8.dta
-rw-r--r-- 1 ctaylor econ 14M Jun 27 23:02 off10yr9.dt
DATA CONTENTS (IN TEXT FORM OF COURSE)
head off10yr1.out
scenario metcode yr ginv cons gocc abs dvac gmre gmer
1 "AA" 2001 .04 3384000 .047 3641000 -.006 .025 .028
1 "AA" 2002 .042 3657000 .046 3716000 -.004 .034 .035
1 "AA" 2003 .031 2816000 .047 3972000 -.015 .051 .056
1 "AA" 2004 .035 3271000 .046 4064000 -.01 .075 .078
1 "AA" 2005 .037 3636000 .037 3444000 0 .084 .084
1 "AA" 2006 .041 4183000 .035 3315000 .006 .118 .116
1 "AA" 2007 .043 4513000 .019 1915000 .021 .094 .086
1 "AA" 2008 .039 4320000 .034 3431000 .005 .068 .066
1 "AA" 2009 .034 3848000 .05 5262000 -.015 .057 .063
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list