[R] How long does skipping in read.table take

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Sat Oct 23 15:44:16 CEST 2010


Oh, I understand - I did not realize it's reading in the whole file.
So, is there any way to make it read it in only once and the spit into
R just one piece (e.g., 1 million rows), write a regular file out
(e.g., a txt using write.table), and then grab the next million?
Because I was planning to do something like this (I have 17+ million rows):

for(1:16){
 filename<-paste("million number ",i,".txt",sep="")
 mydata<-read.csv.sql("myfile.txt", sep="|", eol="\r\n", sql = "select
* from file limit 1000000, (1000000*i-1)")
 write.table(mydata,file=filename,sep="\t",row.names=F)
}

But if each iteration it will be reading in the whole file for a long
time - it'll take a long time... Is there any way to make it read the
file in only once? I guess not - because there is not enough memory to
hold it in the first place...
Thanks again for your advice!
Dimitri



On Sat, Oct 23, 2010 at 9:32 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> On Sat, Oct 23, 2010 at 9:20 AM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> This is very helpful, Gabor.
>> I've run the code to figure out the end of the line and here is what I
>> am seeing at the end of each line: \r\n
>> So, I specified like this:
>> mydata<-read.csv.sql("myfile.txt", sep="|", eol="\r\n", sql = "select
>> * from file limit 200, 100")
>>
>> However, again it's hanging again. Another typo?
>>
>
> I wonder if its just taking longer than you think.  It does read the
> entire file into sqlite even if you only read a portion from sqlite to
> R so if the file is very long it will still take some time.  Try
> creating a small file of a few hundred lines from your file and
> experiment on that until you get it working.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com



More information about the R-help mailing list