[R-sig-DB] [R] SQLite: When reading a table, a "\r" is padded onto the last column. Why?

Seth Falcon @|@|con @end|ng |rom |hcrc@org
Fri Jan 5 22:27:26 CET 2007


"David James" <daj025 using gmail.com> writes:

> Hi,
>
> A word of caution:  relying on read.table() to read very large
> files (even piecemeal) can be extremely inefficient.  The reason
> I implemented the dbWriteTable from a file is exactly to circumvent
> this problem.  You can upload a huge data file into the db
> extremely fast, but going through read.table() could be considerably
> slower.

I wonder if things have changed at all in that regard.  I would expect
read.table to be slower, but, if called carefully, not _extremely_
slower.  We should do some timings...

My concern with the .import-like code is on the maintenance
side and in terms of providing all of the same features as read.table
(at which point, perhaps the performance gain will diminish).

> Perhaps we're considering two slightly different situations:
> (1) export a data.frame to sqlite (here, the prepared statement/binding
> approach is optimal).

The prepared statement/binding approach is also the thing to do when
you have a large text file that you want to import, but you need to do
some massaging along the way (perhaps you only want parts of it,
perhaps you want to load into two or more tables, etc).

> (2) load a csv file into R or SQLite (here using the C code from .import
> is optimal).

I can see the use-case, but have some reservations.  Without the
functionality we have now you could:

  a. use SQLite directly and skip R.
  b. use R and make a system call to the sqlite command line.

The dbWriteTable functionality is more elegant than (a) or (b), and
perhaps I would have fewer reservations if it was called
sqliteImportFile() -- do the other DBI packages support file import in
this way via dbWriteTable?

+ seth




More information about the R-sig-DB mailing list