[R] How to deal with more than 6GB dataset using R?
Duncan Murdoch
murdoch.duncan at gmail.com
Fri Jul 23 18:36:05 CEST 2010
On 23/07/2010 12:10 PM, babyfoxlove1 at sina.com wrote:
> Hi there,
>
> Sorry to bother those who are not interested in this problem.
>
> I'm dealing with a large data set, more than 6 GB file, and doing regression test with those data. I was wondering are there any efficient ways to read those data? Instead of just using read.table()? BTW, I'm using a 64bit version desktop and a 64bit version R, and the memory for the desktop is enough for me to use.
> Thanks.
>
You probably won't get much faster than read.table with all of the
colClasses specified. It will be a lot slower if you leave that at the
default NA setting, because then R needs to figure out the types by
reading them as character and examining all the values. If the file is
very consistently structured (e.g. the same number of characters in
every value in every row) you might be able to write a C function to
read it faster, but I'd guess the time spent writing that would be a lot
more than the time saved.
Duncan Murdoch
More information about the R-help
mailing list