[R-sig-DB] [R] SQLite: When reading a table, a "\r" is padded onto the last column. Why?

ronggui ronggui.huang at gmail.com
Sat Jan 6 11:14:29 CET 2007


On 1/6/07, Seth Falcon <sfalcon at fhcrc.org> wrote:
> "David James" <daj025 at gmail.com> writes:
>
> > Hi,
> >
> > A word of caution:  relying on read.table() to read very large
> > files (even piecemeal) can be extremely inefficient.  The reason
> > I implemented the dbWriteTable from a file is exactly to circumvent
> > this problem.  You can upload a huge data file into the db
> > extremely fast, but going through read.table() could be considerably
> > slower.
>
> I wonder if things have changed at all in that regard.  I would expect
> read.table to be slower, but, if called carefully, not _extremely_
> slower.  We should do some timings...

Yeah, Things change a lot.
I use system.time to do the timings on a large csv file (365M, with
sep="\t", quote="")

The result is as follow:
1, use read.table and then dbWriteTable in chunck.
> t1
[1] 700.22  16.06 768.01     NA     NA
2, use bdWriteTable to import file directly.
> t2
[1]  48.88   5.30 128.39     NA     NA

This is one-shot test, and the result maybe is no reliable enough, but
it indicates a huge difference.

This is a working method, but not efficient. Of course, I agree that
from the maintainer-perspective, this method is prefered:)

Ronggui Huang

> My concern with the .import-like code is on the maintenance
> side and in terms of providing all of the same features as read.table
> (at which point, perhaps the performance gain will diminish).
>
> > Perhaps we're considering two slightly different situations:
> > (1) export a data.frame to sqlite (here, the prepared statement/binding
> > approach is optimal).
>
> The prepared statement/binding approach is also the thing to do when
> you have a large text file that you want to import, but you need to do
> some massaging along the way (perhaps you only want parts of it,
> perhaps you want to load into two or more tables, etc).
>
> > (2) load a csv file into R or SQLite (here using the C code from .import
> > is optimal).
>
> I can see the use-case, but have some reservations.  Without the
> functionality we have now you could:
>
>   a. use SQLite directly and skip R.
>   b. use R and make a system call to the sqlite command line.
>
> The dbWriteTable functionality is more elegant than (a) or (b), and
> perhaps I would have fewer reservations if it was called
> sqliteImportFile() -- do the other DBI packages support file import in
> this way via dbWriteTable?
>
> + seth
>


-- 
Ronggui Huang
Department of Sociology
Fudan University, Shanghai, China
黄荣贵
复旦大学社会学系




More information about the R-sig-DB mailing list