[R] R can't load a large dataset

David Winsemius dwinsemius at comcast.net
Thu Mar 1 01:26:32 CET 2012


On Feb 29, 2012, at 7:00 PM, Francesco Sarracino wrote:

> Dear R listers,
>
> I have a silly problem. I am trying to load a dta (Stata) file in R.
> The dta is about 650 MB and contains the integrated World Values
> Survey/ European Value Study data-set.
> My problem is that I don't manage to load the file. After almost 1
> hour I issued the following command:
> data <- read.dta("http://www.stata-press.com/data/kkd/data1.dta",
>  convert.dates = TRUE, convert.factors = TRUE,
>      missing.type = FALSE,
>      convert.underscore = FALSE, warn.missing.labels = TRUE)

I get MUCH smaller data.frame;

require(foreign)
...then your code:

(Almost instantaneous return to console prompt.)

 > str(data)
'data.frame':	3340 obs. of  47 variables:
  $ persnr  : int  2229 3994 6326 8660 10622 13277 15241 17852 19635  
21501 ...
  $ intnr   : int  145700 256862 166979 120826 154849 138118 13277  
160539 194697 150495 ...
  $ state   : Factor w/ 16 levels "Berlin","Schl.Hst",..: 6 15 2 6 6  
10 6 6 10 9 ...
  $ gender  : Factor w/ 2 levels "Maenner","Frauen": 1 1 2 1 1 1 1 1 2  
2 ...
Snipped a few pages...


The column names don't really look like what you describe:

 >  names(data)
  [1] "persnr"   "intnr"    "state"    "gender"   "ybirth"   "ymove"
  [7] "ybuild"   "hcond"    "sqm"      "rooms"    "fseval"   "kitchen"
[13] "shower"   "wc"       "heating"  "cellar"   "balcony"  "garden"
[19] "phone"    "renttype" "rent"     "renteval" "hhtype"   "htype"
[25] "area"     "np11701"  "np0105"   "np9401"   "np9402"   "np9403"
[31] "np9501"   "np9502"   "np9503"   "np9504"   "np9506"   "np9507"
[37] "hhpos"    "hhsize"   "marital"  "edu"      "voc"      "yedu"
[43] "emp"      "occ"      "hhinc"    "income"   "egph"

 >  sessionInfo()
R version 2.14.0 Patched (2011-11-13 r57650)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     grDevices utils     datasets  graphics  methods
[7] base

other attached packages:
[1] foreign_0.8-47 sos_1.3-1      brew_1.0-6     lattice_0.20-0

loaded via a namespace (and not attached):
[1] grid_2.14.0  tools_2.14.0


>
> I still don't have my data loaded. Moreover, my system becomes very
> slow and not responsive.
> I can't figure out what is going on.
> Here you are my specs:
> Ubuntu Linux 11.10 x86_64-pc-linux-gnu (64-bit)
> Intel Core i7, 4 GB RAM, 367 GB Free HD, 8 GB swap memory
> R:
> R version 2.14.1 (2011-12-22)
>
> Can you please help me figuring out what's wrong? I think it's
> impossible that R can't handle files of similar sizes.
> Thanks a lot,
> f.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list