[R] How long does skipping in read.table take

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Sat Oct 23 16:08:26 CEST 2010


O, wait a sec - does it mean I can't feed my objects into sql commands?

On Sat, Oct 23, 2010 at 10:07 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> I just tried it:
>
> for(i in 11:16){ #i<-11
>  start<-Sys.time()
>  print(start)
>  flush.console()
>  filename<-paste("skipped millions- ",i,".txt",sep="")
>  mydata<-read.csv.sql("myfilel.txt", sep="|", eol="\r\n", sql =
> "select * from file limit 1000000, (1000000*i-1)")
>  write.table(mydata,file=filename,sep="\t",row.names=F)
>  end<-Sys.time()
>  print(end-start)
>  flush.console()
> }
>
> It started running at 9:52 am.
> Around 10:05 am I got this error:
>
> Error in sqliteExecStatement(con, statement, bind.data) :
>  RS-DBI driver: (error in statement: no such column: i)
>
> What does it mean?
> Thank you!
> Dimitri
>
> On Sat, Oct 23, 2010 at 9:44 AM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> Oh, I understand - I did not realize it's reading in the whole file.
>> So, is there any way to make it read it in only once and the spit into
>> R just one piece (e.g., 1 million rows), write a regular file out
>> (e.g., a txt using write.table), and then grab the next million?
>> Because I was planning to do something like this (I have 17+ million rows):
>>
>> for(1:16){
>>  filename<-paste("million number ",i,".txt",sep="")
>>  mydata<-read.csv.sql("myfile.txt", sep="|", eol="\r\n", sql = "select
>> * from file limit 1000000, (1000000*i-1)")
>>  write.table(mydata,file=filename,sep="\t",row.names=F)
>> }
>>
>> But if each iteration it will be reading in the whole file for a long
>> time - it'll take a long time... Is there any way to make it read the
>> file in only once? I guess not - because there is not enough memory to
>> hold it in the first place...
>> Thanks again for your advice!
>> Dimitri
>>
>>
>>
>> On Sat, Oct 23, 2010 at 9:32 AM, Gabor Grothendieck
>> <ggrothendieck at gmail.com> wrote:
>>> On Sat, Oct 23, 2010 at 9:20 AM, Dimitri Liakhovitski
>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>> This is very helpful, Gabor.
>>>> I've run the code to figure out the end of the line and here is what I
>>>> am seeing at the end of each line: \r\n
>>>> So, I specified like this:
>>>> mydata<-read.csv.sql("myfile.txt", sep="|", eol="\r\n", sql = "select
>>>> * from file limit 200, 100")
>>>>
>>>> However, again it's hanging again. Another typo?
>>>>
>>>
>>> I wonder if its just taking longer than you think.  It does read the
>>> entire file into sqlite even if you only read a portion from sqlite to
>>> R so if the file is very long it will still take some time.  Try
>>> creating a small file of a few hundred lines from your file and
>>> experiment on that until you get it working.
>>>
>>> --
>>> Statistics & Software Consulting
>>> GKX Group, GKX Associates Inc.
>>> tel: 1-877-GKX-GROUP
>>> email: ggrothendieck at gmail.com
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> Ninah Consulting
>> www.ninah.com
>>
>
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com



More information about the R-help mailing list