[R-sig-DB] [R] SQLite: When reading a table, a "\r" is padded onto the last column. Why?

Prof Brian Ripley r|p|ey @end|ng |rom @t@t@@ox@@c@uk
Fri Jan 5 22:57:17 CET 2007


On Fri, 5 Jan 2007, Seth Falcon wrote:

> "David James" <daj025 using gmail.com> writes:
>
>> Hi,
>>
>> A word of caution:  relying on read.table() to read very large
>> files (even piecemeal) can be extremely inefficient.  The reason
>> I implemented the dbWriteTable from a file is exactly to circumvent
>> this problem.  You can upload a huge data file into the db
>> extremely fast, but going through read.table() could be considerably
>> slower.
>
> I wonder if things have changed at all in that regard.  I would expect
> read.table to be slower, but, if called carefully, not _extremely_
> slower.  We should do some timings...

I would be surprised if read.table used carefully took a significant part 
of the time of a total analysis.  I hesitate to do timings without knowing 
the sort of table you are discussing: does it have many columns or many 
rows or both, and what variable types?

Having some real-life examples to think about would be very helpful (as 
it would be for some of the efficiency issues we have been working on 
with data frames).

> My concern with the .import-like code is on the maintenance
> side and in terms of providing all of the same features as read.table
> (at which point, perhaps the performance gain will diminish).
>
>> Perhaps we're considering two slightly different situations:
>> (1) export a data.frame to sqlite (here, the prepared statement/binding
>> approach is optimal).
>
> The prepared statement/binding approach is also the thing to do when
> you have a large text file that you want to import, but you need to do
> some massaging along the way (perhaps you only want parts of it,
> perhaps you want to load into two or more tables, etc).
>
>> (2) load a csv file into R or SQLite (here using the C code from .import
>> is optimal).
>
> I can see the use-case, but have some reservations.  Without the
> functionality we have now you could:
>
>  a. use SQLite directly and skip R.
>  b. use R and make a system call to the sqlite command line.

Or
    c. Send a suitable SQL query from the R package. I've done that in
       RMySQL and RODBC in the past (e.g. using LOAD DATA INFILE in MySQL).

> The dbWriteTable functionality is more elegant than (a) or (b), and
> perhaps I would have fewer reservations if it was called
> sqliteImportFile() -- do the other DBI packages support file import in
> this way via dbWriteTable?
>
> + seth
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>

-- 
Brian D. Ripley,                  ripley using stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-sig-DB mailing list