[R] How long does skipping in read.table take

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Sat Oct 23 16:07:00 CEST 2010


I just tried it:

for(i in 11:16){ #i<-11
 start<-Sys.time()
 print(start)
 flush.console()
 filename<-paste("skipped millions- ",i,".txt",sep="")
 mydata<-read.csv.sql("myfilel.txt", sep="|", eol="\r\n", sql =
"select * from file limit 1000000, (1000000*i-1)")
 write.table(mydata,file=filename,sep="\t",row.names=F)
 end<-Sys.time()
 print(end-start)
 flush.console()
}

It started running at 9:52 am.
Around 10:05 am I got this error:

Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: no such column: i)

What does it mean?
Thank you!
Dimitri

On Sat, Oct 23, 2010 at 9:44 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> Oh, I understand - I did not realize it's reading in the whole file.
> So, is there any way to make it read it in only once and the spit into
> R just one piece (e.g., 1 million rows), write a regular file out
> (e.g., a txt using write.table), and then grab the next million?
> Because I was planning to do something like this (I have 17+ million rows):
>
> for(1:16){
>  filename<-paste("million number ",i,".txt",sep="")
>  mydata<-read.csv.sql("myfile.txt", sep="|", eol="\r\n", sql = "select
> * from file limit 1000000, (1000000*i-1)")
>  write.table(mydata,file=filename,sep="\t",row.names=F)
> }
>
> But if each iteration it will be reading in the whole file for a long
> time - it'll take a long time... Is there any way to make it read the
> file in only once? I guess not - because there is not enough memory to
> hold it in the first place...
> Thanks again for your advice!
> Dimitri
>
>
>
> On Sat, Oct 23, 2010 at 9:32 AM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> On Sat, Oct 23, 2010 at 9:20 AM, Dimitri Liakhovitski
>> <dimitri.liakhovitski at gmail.com> wrote:
>>> This is very helpful, Gabor.
>>> I've run the code to figure out the end of the line and here is what I
>>> am seeing at the end of each line: \r\n
>>> So, I specified like this:
>>> mydata<-read.csv.sql("myfile.txt", sep="|", eol="\r\n", sql = "select
>>> * from file limit 200, 100")
>>>
>>> However, again it's hanging again. Another typo?
>>>
>>
>> I wonder if its just taking longer than you think.  It does read the
>> entire file into sqlite even if you only read a portion from sqlite to
>> R so if the file is very long it will still take some time.  Try
>> creating a small file of a few hundred lines from your file and
>> experiment on that until you get it working.
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>>
>
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com



More information about the R-help mailing list