[R] Reading large files quickly

Rob Steele freenx.10.robsteele at xoxy.net
Mon May 11 03:39:31 CEST 2009


At the moment I'm just reading the large file to see how fast it goes.
Eventually, if I can get the read time down, I'll write out a processed
version.  Thanks for suggesting scan(); I'll try it.

Rob

jim holtman wrote:
> Since you are reading it in chunks, I assume that you are writing out each
> segment as you read it in.  How are you writing it out to save it?  Is the
> time you are quoting both the reading and the writing?  If so, can you break
> down the differences in what these operations are taking?
> 
> How do you plan to use the data?  Is it all numeric?  Are you keeping it in
> a dataframe?  Have you considered using 'scan' to read in the data and to
> specify what the columns are?  If you would like some more help, the answer
> to these questions will help.
> 
> On Sat, May 9, 2009 at 10:09 PM, Rob Steele <freenx.10.robsteele at xoxy.net>wrote:
> 
>> Thanks guys, good suggestions.  To clarify, I'm running on a fast
>> multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1.
>> Paging shouldn't be an issue since I'm reading in chunks and not trying
>> to store the whole file in memory at once.  Thanks again.
>>
>> Rob Steele wrote:
>>> I'm finding that readLines() and read.fwf() take nearly two hours to
>>> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
>>>  The unix command wc by contrast processes the same file in three
>>> minutes.  Is there a faster way to read files in R?
>>>
>>> Thanks!
>>  >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
>



More information about the R-help mailing list