[R] How long does skipping in read.table take
Dimitri Liakhovitski
dimitri.liakhovitski at gmail.com
Sat Oct 23 16:08:26 CEST 2010
O, wait a sec - does it mean I can't feed my objects into sql commands?
On Sat, Oct 23, 2010 at 10:07 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> I just tried it:
>
> for(i in 11:16){ #i<-11
> start<-Sys.time()
> print(start)
> flush.console()
> filename<-paste("skipped millions- ",i,".txt",sep="")
> mydata<-read.csv.sql("myfilel.txt", sep="|", eol="\r\n", sql =
> "select * from file limit 1000000, (1000000*i-1)")
> write.table(mydata,file=filename,sep="\t",row.names=F)
> end<-Sys.time()
> print(end-start)
> flush.console()
> }
>
> It started running at 9:52 am.
> Around 10:05 am I got this error:
>
> Error in sqliteExecStatement(con, statement, bind.data) :
> RS-DBI driver: (error in statement: no such column: i)
>
> What does it mean?
> Thank you!
> Dimitri
>
> On Sat, Oct 23, 2010 at 9:44 AM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> Oh, I understand - I did not realize it's reading in the whole file.
>> So, is there any way to make it read it in only once and the spit into
>> R just one piece (e.g., 1 million rows), write a regular file out
>> (e.g., a txt using write.table), and then grab the next million?
>> Because I was planning to do something like this (I have 17+ million rows):
>>
>> for(1:16){
>> filename<-paste("million number ",i,".txt",sep="")
>> mydata<-read.csv.sql("myfile.txt", sep="|", eol="\r\n", sql = "select
>> * from file limit 1000000, (1000000*i-1)")
>> write.table(mydata,file=filename,sep="\t",row.names=F)
>> }
>>
>> But if each iteration it will be reading in the whole file for a long
>> time - it'll take a long time... Is there any way to make it read the
>> file in only once? I guess not - because there is not enough memory to
>> hold it in the first place...
>> Thanks again for your advice!
>> Dimitri
>>
>>
>>
>> On Sat, Oct 23, 2010 at 9:32 AM, Gabor Grothendieck
>> <ggrothendieck at gmail.com> wrote:
>>> On Sat, Oct 23, 2010 at 9:20 AM, Dimitri Liakhovitski
>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>> This is very helpful, Gabor.
>>>> I've run the code to figure out the end of the line and here is what I
>>>> am seeing at the end of each line: \r\n
>>>> So, I specified like this:
>>>> mydata<-read.csv.sql("myfile.txt", sep="|", eol="\r\n", sql = "select
>>>> * from file limit 200, 100")
>>>>
>>>> However, again it's hanging again. Another typo?
>>>>
>>>
>>> I wonder if its just taking longer than you think. It does read the
>>> entire file into sqlite even if you only read a portion from sqlite to
>>> R so if the file is very long it will still take some time. Try
>>> creating a small file of a few hundred lines from your file and
>>> experiment on that until you get it working.
>>>
>>> --
>>> Statistics & Software Consulting
>>> GKX Group, GKX Associates Inc.
>>> tel: 1-877-GKX-GROUP
>>> email: ggrothendieck at gmail.com
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> Ninah Consulting
>> www.ninah.com
>>
>
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
--
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com
More information about the R-help
mailing list