[R] data usage

Douglas Bates bates at stat.wisc.edu
Mon Mar 29 15:25:34 CEST 2004


Edwin Leuven <e.leuven at uva.nl> writes:

> for my present project i need to use the data stored in a ca. 100mb 
> stata dataset. 
> 
> when i import the data in R using:
> 
> library("foreign")
> x<-read.dta("mydata.dta")
> 
> i find that R needs a startling 665mb of memory!
> 
> (in stata i can simply allocate, say, 128mb of memory and go ahead)
> 
> is there anyway around this, or should i forget R for analysis of 
> datasets of this magnitude?

What does the 665 MB represent?  Did you try doing a garbage
collection after you had done the import?

I would suggest

library("foreign")
x<-read.dta("mydata.dta")
gc()              # possibly repeat gc() to lower the thresholds
object.size(x)    # the actual storage (in bytes) allocated to this object
save(x, file = "mydata.rda", compress = TRUE)

After that you can start a new session and use

load("mydata.rda")

to obtain a copy of the data set without the storage overhead incurred
by the stata -> R conversion.

P.S. As described in the help page for object.size, the returned value
is more properly described as an estimate of the object size because
sometimes it is difficult to determine the object size accurately.




More information about the R-help mailing list