[R] R can't load a large dataset
David Winsemius
dwinsemius at comcast.net
Thu Mar 1 01:26:32 CET 2012
On Feb 29, 2012, at 7:00 PM, Francesco Sarracino wrote:
> Dear R listers,
>
> I have a silly problem. I am trying to load a dta (Stata) file in R.
> The dta is about 650 MB and contains the integrated World Values
> Survey/ European Value Study data-set.
> My problem is that I don't manage to load the file. After almost 1
> hour I issued the following command:
> data <- read.dta("http://www.stata-press.com/data/kkd/data1.dta",
> convert.dates = TRUE, convert.factors = TRUE,
> missing.type = FALSE,
> convert.underscore = FALSE, warn.missing.labels = TRUE)
I get MUCH smaller data.frame;
require(foreign)
...then your code:
(Almost instantaneous return to console prompt.)
> str(data)
'data.frame': 3340 obs. of 47 variables:
$ persnr : int 2229 3994 6326 8660 10622 13277 15241 17852 19635
21501 ...
$ intnr : int 145700 256862 166979 120826 154849 138118 13277
160539 194697 150495 ...
$ state : Factor w/ 16 levels "Berlin","Schl.Hst",..: 6 15 2 6 6
10 6 6 10 9 ...
$ gender : Factor w/ 2 levels "Maenner","Frauen": 1 1 2 1 1 1 1 1 2
2 ...
Snipped a few pages...
The column names don't really look like what you describe:
> names(data)
[1] "persnr" "intnr" "state" "gender" "ybirth" "ymove"
[7] "ybuild" "hcond" "sqm" "rooms" "fseval" "kitchen"
[13] "shower" "wc" "heating" "cellar" "balcony" "garden"
[19] "phone" "renttype" "rent" "renteval" "hhtype" "htype"
[25] "area" "np11701" "np0105" "np9401" "np9402" "np9403"
[31] "np9501" "np9502" "np9503" "np9504" "np9506" "np9507"
[37] "hhpos" "hhsize" "marital" "edu" "voc" "yedu"
[43] "emp" "occ" "hhinc" "income" "egph"
> sessionInfo()
R version 2.14.0 Patched (2011-11-13 r57650)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats grDevices utils datasets graphics methods
[7] base
other attached packages:
[1] foreign_0.8-47 sos_1.3-1 brew_1.0-6 lattice_0.20-0
loaded via a namespace (and not attached):
[1] grid_2.14.0 tools_2.14.0
>
> I still don't have my data loaded. Moreover, my system becomes very
> slow and not responsive.
> I can't figure out what is going on.
> Here you are my specs:
> Ubuntu Linux 11.10 x86_64-pc-linux-gnu (64-bit)
> Intel Core i7, 4 GB RAM, 367 GB Free HD, 8 GB swap memory
> R:
> R version 2.14.1 (2011-12-22)
>
> Can you please help me figuring out what's wrong? I think it's
> impossible that R can't handle files of similar sizes.
> Thanks a lot,
> f.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list