[R-sig-DB] [R] SQLite: When reading a table, a "\r" is padded onto the last column. Why?
ronggui.huang at gmail.com
Sat Jan 6 11:14:29 CET 2007
On 1/6/07, Seth Falcon <sfalcon at fhcrc.org> wrote:
> "David James" <daj025 at gmail.com> writes:
> > Hi,
> > A word of caution: relying on read.table() to read very large
> > files (even piecemeal) can be extremely inefficient. The reason
> > I implemented the dbWriteTable from a file is exactly to circumvent
> > this problem. You can upload a huge data file into the db
> > extremely fast, but going through read.table() could be considerably
> > slower.
> I wonder if things have changed at all in that regard. I would expect
> read.table to be slower, but, if called carefully, not _extremely_
> slower. We should do some timings...
Yeah, Things change a lot.
I use system.time to do the timings on a large csv file (365M, with
The result is as follow:
1, use read.table and then dbWriteTable in chunck.
 700.22 16.06 768.01 NA NA
2, use bdWriteTable to import file directly.
 48.88 5.30 128.39 NA NA
This is one-shot test, and the result maybe is no reliable enough, but
it indicates a huge difference.
This is a working method, but not efficient. Of course, I agree that
from the maintainer-perspective, this method is prefered:)
> My concern with the .import-like code is on the maintenance
> side and in terms of providing all of the same features as read.table
> (at which point, perhaps the performance gain will diminish).
> > Perhaps we're considering two slightly different situations:
> > (1) export a data.frame to sqlite (here, the prepared statement/binding
> > approach is optimal).
> The prepared statement/binding approach is also the thing to do when
> you have a large text file that you want to import, but you need to do
> some massaging along the way (perhaps you only want parts of it,
> perhaps you want to load into two or more tables, etc).
> > (2) load a csv file into R or SQLite (here using the C code from .import
> > is optimal).
> I can see the use-case, but have some reservations. Without the
> functionality we have now you could:
> a. use SQLite directly and skip R.
> b. use R and make a system call to the sqlite command line.
> The dbWriteTable functionality is more elegant than (a) or (b), and
> perhaps I would have fewer reservations if it was called
> sqliteImportFile() -- do the other DBI packages support file import in
> this way via dbWriteTable?
> + seth
Department of Sociology
Fudan University, Shanghai, China
More information about the R-sig-DB