[R] How long does skipping in read.table take

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Sat Oct 23 03:45:33 CEST 2010


Gabor,
thanks a lot - sqldf might be a solution. However, do you know if
sqldf can also read in .txt files (with different delimiters)?
The data I am dealing with is "|" - delimited. So, I was using
read.table(...,sep="|")
I looked at sqldf description - but did not see examples with .txt.

Thanks a lot!
Dimitri

On Fri, Oct 22, 2010 at 6:28 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> On Fri, Oct 22, 2010 at 5:17 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> I know I could figure it out empirically - but maybe based on your
>> experience you can tell me if it's doable in a reasonable amount of
>> time:
>> I have a table (in .txt) with a 17,000,000 rows (and 30 columns).
>> I can't read it all in (there are many strings). So I thought I could
>> read it in in parts (e.g., 1 milllion) using nrows= and skip.
>> I was able to read in the first 1,000,000 rows no problem in 45 sec.
>> But then I tried to skip 16,999,999 rows and then read in things. Then
>> R crashed. Should I try again - or is it too many rows to skip for R?
>>
>
> You could try read.csv.sql in sqldf.
>
> library(sqldf)
> read.csv.sql("myfile.csv", skip = 1000, header = FALSE)
> or
> read.csv.sql("myfile.csv, sql = "select * from file 2000, 1000")
>
> The first skips the first 1000 lines including the header and the
> second one skips 1000 rows (but still reads in the header) and then
> reads 2000 rows.  You may or may not need to specify other arguments
> as well. For example, you may need to specify eol = "\n" or other
> depending on your line endings.
>
> Unlike read.csv, read.csv.sql reads the data directly into an sqlite
> database (which it creates on the fly for you).  The data does not go
> through R during this operation.  From there it reads only the data
> you ask for into R so R never sees the skipped over data.  After all
> that it automatically deletes the database.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com



More information about the R-help mailing list