[R] How to deal with more than 6GB dataset using R?
Allan Engelhardt
allane at cybaea.com
Fri Jul 23 18:39:25 CEST 2010
read.table is not very inefficient IF you specify the colClasses=
parameter. scan (with the what= parameter) is probably a little more
efficient. In either case, save the data using save() once you have it
in the right structure and it will be much more efficient to read it
next time. (In fact I often exit R at this stage and re-start it with
the .RData file before I start the analysis to clear out the memory.)
I did a lot of testing on the types of (large) data structures I
normally work with and found that options("save.defaults" =
list(compress="bzip2", compression_level=6, ascii=FALSE)) gave me the
best trade-off between size and speed. Your mileage will undoubtedly
vary, but if you do this a lot it may be worth getting hard data for
your setup.
Hope this helps a little.
Allan
On 23/07/10 17:10, babyfoxlove1 at sina.com wrote:
> Hi there,
>
> Sorry to bother those who are not interested in this problem.
>
> I'm dealing with a large data set, more than 6 GB file, and doing regression test with those data. I was wondering are there any efficient ways to read those data? Instead of just using read.table()? BTW, I'm using a 64bit version desktop and a 64bit version R, and the memory for the desktop is enough for me to use.
> Thanks.
>
>
> --Gin
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list